-
Notifications
You must be signed in to change notification settings - Fork 105
Install #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
for more information, see https://pre-commit.ci
ekmb
approved these changes
Feb 7, 2023
BuyuanCui
pushed a commit
to BuyuanCui/NeMo-text-processing
that referenced
this pull request
Jul 6, 2023
* remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
mgrafu
pushed a commit
that referenced
this pull request
Jul 18, 2023
* remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
BuyuanCui
pushed a commit
that referenced
this pull request
Sep 26, 2024
* remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com>
mgrafu
added a commit
that referenced
this pull request
Oct 1, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-com 341A mit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see h F438 ttps://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes …
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 24, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * r F438 emove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed- F438 off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off- 10000 by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
added a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> 10000 ; * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@k 10000 th.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci F438 ] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
pushed a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit. 10000 com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
added a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commi 10000 t.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv
added a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more informat 10000 ion, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[…
ankitnv
added a commit
to ankitnv/NeMo-text-processing
that referenced
this pull request
Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal changes will change back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn date Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving conflict Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updats on Jenkins Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkinspdate Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changing the data format, to align to the blind test data Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding one more test item Signed-off-by: Alex Cui <alexcui1994@gmail.com> * temporal fixings attempt to fixn SH test errors, will fix back Signed-off-by: Alex Cui <alexcui1994@gmail.com> * adding grammars back in the tokenizer Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixing ci test cases resolving conflicts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * with pynini closure had errors chaing back to no closure version Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed typo on decimaltext Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed grammar Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unsed improts Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unused import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * changed regular space to narrow space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports error fixing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * imports errors Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Jekins update for jp itn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * reverting Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update for fraction space issuel chaing narrow space to regular normal space Signed-off-by: Alex Cui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixng style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jp tn date update Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update test_cases_fraction.txt Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * removing previously created nemo imports Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * test order arrangement Signed-off-by: Alex Cui <alexcui1994@gmail.com> * resolve fraction space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * style fix Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix style Signed-off-by: Alex Cui <alexcui1994@gmail.com> * space issue Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update jp tn Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing unsed import Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Update post_processing.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * empty file Signed-off-by: Alex Cui <alexcui1994@gmail.com> * to delete Signed-off-by: Alex Cui <alexcui1994@gmail.com> * removing Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add contributing (#21) * add contributing Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * add Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add jenkins file (#23) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish TN (#12) * test now runs, but getting ordinal instead of cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * force ordinals to either have :a/:e or "." at the end Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal ordinal data Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for ordinals, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add // to symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add test cases for electronic; transformed with sed from spanish, so I expect errors Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for electronic, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to make electronic verbaliser work (not yet) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for fraction, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix language Signed-off-by: Jim O'Regan <joregan@kth.se> * fix graph construction to make pluralisation work Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for decimal, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for whitelist, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal test case for whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for word, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for date, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for measure, adapted from es Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a pair of test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix number, but this whole thing is only partially adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add usd$ Signed-off-by: Jim O'Regan <joregan@kth.se> * insert "komma" Signed-off-by: Jim O'Regan <joregan@kth.se> * "pund" is neuter Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * towards proper graphs Signed-off-by: Jim O'Regan <joregan@kth.se> * GBP Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make komma non-det Signed-off-by: Jim O'Regan <joregan@kth.se> * more money tagger fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more minor words Signed-off-by: Jim O'Regan <joregan@kth.se> * do a bit better with en/ett Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use the correct list Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * make sure the numbers have no 1 Signed-off-by: Jim O'Regan <joregan@kth.se> * abbreviations for million and milliard Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add year suffixes Signed-off-by: Jim O'Regan <joregan@kth.se> * add minimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * expansions of era abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras Signed-off-by: Jim O'Regan <joregan@kth.se> * use eras in verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix examples in comment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix extension Signed-off-by: Jim O'Regan <joregan@kth.se> * fix separator Signed-off-by: Jim O'Regan <joregan@kth.se> * date verbaliser is broken, this does not fix it Signed-off-by: Jim O'Regan <joregan@kth.se> * load labels Signed-off-by: Jim O'Regan <joregan@kth.se> * right first time Signed-off-by: Jim O'Regan <joregan@kth.se> * missing space Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year in test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * getting closer to getting dates working Signed-off-by: Jim O'Regan <joregan@kth.se> * add a (failing) test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * date working now Signed-off-by: Jim O'Regan <joregan@kth.se> * also handle decades Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * years where -00 is -hundra Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for telephone (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * changes to telephone tagger/verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add partially incomplete test data Signed-off-by: Jim O'Regan <joregan@kth.se> * mostly fixed test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * more in progress changes to telephone parts Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * much prodding later, turns out I forgot a space Signed-off-by: Jim O'Regan <joregan@kth.se> * missed wrapping Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "no difference" This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9. Signed-off-by: Jim O'Regan <joregan@kth.se> * telephone tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try adding brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * try adding more brackets Signed-off-by: Jim O'Regan <joregan@kth.se> * fix another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a comment, because I confused myself Signed-off-by: Jim O'Regan <joregan@kth.se> * move abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add in abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits Signed-off-by: Jim O'Regan <joregan@kth.se> * single digit Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case/remove a duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * use the nice variable I just added to cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * this is not right; leading zeros fail Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "this is not right; leading zeros fail" This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f. Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, this seems to work Signed-off-by: Jim O'Regan <joregan@kth.se> * drop the tests starting with comma Signed-off-by: Jim O'Regan <joregan@kth.se> * decimal tagger works Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * lower case Signed-off-by: Jim O'Regan <joregan@kth.se> * add klockan and variants as a prompt, so they are not silently deleted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a very minimal test case for time Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite with less ambiguity, hms Signed-off-by: Jim O'Regan <joregan@kth.se> * add prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * copy the roman handling from es Signed-off-by: Jim O'Regan <joregan@kth.se> * greek letters Signed-off-by: Jim O'Regan <joregan@kth.se> * some fixes to the time tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test runner for time (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * test runner for time (adapted from es) ((actually adapted)) Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on time Signed-off-by: Jim O'Regan <joregan@kth.se> * |=, not = Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt verbaliser a little Signed-off-by: Jim O'Regan <joregan@kth.se> * add some test cases from module comments Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables to check Signed-off-by: Jim O'Regan <joregan@kth.se> * small fix Signed-off-by: Jim O'Regan <joregan@kth.se> * comment some stuff that needs major changes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Signed-off-by: Jim O'Regan <joregan@kth.se> * try doing this here Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try doing this here" This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix errors in tests Signed-off-by: Jim O'Regan <joregan@kth.se> * minimal test cases for measure Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq everything, see what the difference is Signed-off-by: Jim O'Regan <joregan@kth.se> * merge different tsvs Signed-off-by: Jim O'Regan <joregan@kth.se> * fix casing to avoid conflicts Signed-off-by: Jim O'Regan <joregan@kth.se> * export some variables for testing Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * need an en/ett split here too Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal subgraph Signed-off-by: Jim O'Regan <joregan@kth.se> * remove todo, I've just done it Signed-off-by: Jim O'Regan <joregan@kth.se> * remove missing integer test, does not work elsewhere Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek letters in maths Signed-off-by: Jim O'Regan <joregan@kth.se> * include greek here too Signed-off-by: Jim O'Regan <joregan@kth.se> * minor sg/pl Signed-off-by: Jim O'Regan <joregan@kth.se> * dedup Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * put these under if, too Signed-off-by: Jim O'Regan <joregan@kth.se> * no; there are no minor neuters, so that is not relevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * remove greek from here, interferes with delimeter Signed-off-by: Jim O'Regan <joregan@kth.se> * export variables to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * fix some test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * here is one error Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * put ensure_space in graph_utils Signed-off-by: Jim O'Regan <joregan@kth.se> * handle cases where unit follows amount Signed-off-by: Jim O'Regan <joregan@kth.se> * export a variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add a tesst case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * . is not a cardinal separator Signed-off-by: Jim O'Regan <joregan@kth.se> * fix case Signed-off-by: Jim O'Regan <joregan@kth.se> * add yen Signed-off-by: Jim O'Regan <joregan@kth.se> * final fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove English roman tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_lm.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * remove some unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add tokenize_and_classify_with_audio.py (adapted from en) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test pieces for audio (recopied from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * add audio test (adapted from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * in non-deterministic mode, generate both en and ett Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal non-deterministic test Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * warnings about missing whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * add sv Signed-off-by: Jim O'Regan <joregan@kth.se> * remove commented pieces/things that will not be used Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some Riksdag specific titles Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright to the other files with non-trivial changes Signed-off-by: Jim O'Regan <joregan@kth.se> * fix year Signed-off-by: Jim O'Regan <joregan@kth.se> * add Swedish support in pynini_export Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Swedish support for sparrowhark tests -- untested (: Signed-off-by: Jim O'Regan <joregan@kth.se> * address codeql comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change decade to year; sparrowhawk enforces categories Signed-off-by: Jim O'Regan <joregan@kth.se> * shoehorn this stuff into the overly narrow sparrowhawk classes Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "shoehorn this stuff into the overly narrow sparrowhawk classes" This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688. Signed-off-by: Jim O'Regan <joregan@kth.se> * read out the AM/PM words, they are not read as letters anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * change date verbaliser to manage isolated decades Signed-off-by: Jim O'Regan <joregan@kth.se> * redo changes to get rid of 'prompt' for 'klockan' Signed-off-by: Jim O'Regan <joregan@kth.se> * remove broken duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * add a case for hours without minutes (which should not happen) Signed-off-by: Jim O'Regan <joregan@kth.se> * time tests now pass Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a time test case that also passes here, but not in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error in dates, add more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * import delete_preserve_order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql feedback Signed-off-by: Jim O'Regan <joregan@kth.se> * add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful) Signed-off-by: Jim O'Regan <joregan@kth.se> * move to the correct subdirectory Signed-off-by: Jim O'Regan <joregan@kth.se> * add swedish Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error with 1000 in non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix here also Signed-off-by: Jim O'Regan <joregan@kth.se> * also generate a string of digits if not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * add a date case Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplication Signed-off-by: Jim O'Regan <joregan@kth.se> * boost n_tagged Signed-off-by: Jim O'Regan <joregan@kth.se> * also copyright this year Signed-off-by: Jim O'Regan <joregan@kth.se> * 1500 only fixes one, boost again Signed-off-by: Jim O'Regan <joregan@kth.se> * 2500 does nothing, going to -1 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway" This reverts commit 383a096083061b0c79457e815a65e55563c7ac74. Signed-off-by: Jim O'Regan <joregan@kth.se> * try setting a low weight to everything non-default Signed-off-by: Jim O'Regan <joregan@kth.se> * put n_tagged back to 500 Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * days of the week Signed-off-by: Jim O'Regan <joregan@kth.se> * add more abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * remove blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * forgot to remove this piece in the merge conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * remove erroneously added copyright notice Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * add __init__.py in a few places it was missing Signed-off-by: Jim O'Regan <joregan@kth.se> * add the google notice required by the incoming contributing document Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CI setup (#25) * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci _cr Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert setup tool Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove pytest-runner from setup.py Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test dir Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Merge EN riva release 22.10 (#26) * Merge EN riva release 22.10 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng TN - update urls to handle dictionary words (#27) * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip el words Signed-off-by: ekmb <ebakhturina@nvidia.com> * wip Signed-off-by: ekmb <ebakhturina@nvidia.com> * electronic pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * test pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * remove unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * add deterministic option normalized options Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins grammar folder Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up, update for SH Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * clean up Signed-off-by: ekmb <ebakhturina@nvidia.com> * reduce cardinal graph Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * add weight for sh Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Tn en astronomical no (#28) * Add support for large numbers (>999,999,999,999,999) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder in Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Increase mem size for CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Updating shmem for docker to deal with memory overflow Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Ensure large au cardinal graph is used only if deterministic Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Make comma mandatory in cardinals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run FST cache generation and Pytests in separate stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix stage Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist param to ITN (#30) * add whitelist param to itn Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist to export Signed-off-by: ekmb <ebakhturina@nvidia.com> * update docstrings Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Eng tn itn (#31) * Add additional units and plurals Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for financial periods (1H22, 2Q19) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing plural for "gigabit per second" Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix for measures Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use environment variables to set path of fst cache Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix environment variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix parse "None" as string (#33) * Fix parse "None" as string Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * read double digits for telephone grammar (#32) * read double digits for telephone grammar Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import zero graph instead of hard coding Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> --------- Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#35) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Install (#36) * remove conda pynini install Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added pynini install note Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix text Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.6rc0 (#37) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add ci (#39) * Add additional languages to CI Pipeline Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix Jenkinsfile Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing 'ar' in lang options Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing 'ar' in normalize.py Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Correct name of verbalizer far Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Run language tests in stages Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update DE cache folder Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add VI, RU, SV CI tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix misssing bracket, add ZH Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Use non-deterministic TN for RU Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update fr cache path for ci (#44) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update ITN to work after Punctuation capitalization model (#22) * add cases with capitalization, cardinal, decimal pass Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix telephone, ordinal Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * restarting ci Signed-off-by: ekmb <ebakhturina@nvidia.com> * update electronic Signed-off-by: ekmb <ebakhturina@nvidia.com> * review feedback, update whitelist Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename capitalize func Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix SH tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins folder name Signed-off-by: ekmb <ebakhturina@nvidia.com> * added cased arg to ITN Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input_case arg to other lang Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins dirs update Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * update test Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix codeql errors Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix sh Signed-off-by: ekmb <ebakhturina@nvidia.com> * review Signed-off-by: ekmb <ebakhturina@nvidia.com> * update jenkins dir Signed-off-by: ekmb <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default value Signed-off-by: ekmb <ebakhturina@nvidia.com> --------- Signed-off-by: ekmb <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#42) * Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update doc and fix alignment for itn (#47) * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * save Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * extend alignment for itn Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Align ci test (#51) * added jenkins tests for aligment Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * added test to pr doc Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci test Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix ci Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * fix Signed-off-by: Yang Zhang <yangzhang@nvidia.com> --------- Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Audio-based TN for Swedish (#49) * Audio-based TN for Swedish, for Språkbanken Tal Replaces #48 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating cache directory (Not entirely sure what the pattern is) Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete tokenize_and_classify_lm.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fraction fix from ITN branch Signed-off-by: Jim O'Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix sv tests (#52) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.7 release (#53) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * En names (#56) * Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug for hh:mm:ss normalization (#57) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * rewrite regex to silence deprecation warning (#55) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Hungarian TN ✅ (#9) * additional exports from cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer Signed-off-by: Jim O'Regan <joregan@kth.se> * fixes to verbaliser for decimal Signed-off-by: Jim O'Regan <joregan@kth.se> * more test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add superessive forms (powers of) Signed-off-by: Jim O'Regan <joregan@kth.se> * superscript to superessive Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * add vowels Signed-off-by: Jim O'Regan <joregan@kth.se> * fix var Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum electronic test Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a symbol Signed-off-by: Jim O'Regan <joregan@kth.se> * add incomplete time tagger (partially adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * fix error with some inflected abbreviations Signed-off-by: Jim O'Regan <joregan@kth.se> * add some alternative measure forms Signed-off-by: Jim O'Regan <joregan@kth.se> * hour, minute, second; whichever is last can be inflected Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test runner for time Signed-off-by: Jim O'Regan <joregan@kth.se> * add very minimal time test Signed-off-by: Jim O'Regan <joregan@kth.se> * will want cardinal here Signed-off-by: Jim O'Regan <joregan@kth.se> * add inflection for things like GBP, where inflection is based on pé Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring Signed-off-by: Jim O'Regan <joregan@kth.se> * move two letters Signed-off-by: Jim O'Regan <joregan@kth.se> * add my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted number tagger (adapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * small changes Signed-off-by: Jim O'Regan <joregan@kth.se> * add unadapted measure tagger (from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * other ways of reading w Signed-off-by: Jim O'Regan <joregan@kth.se> * for non deterministic, a bunch of these symbols can be read as letters Signed-off-by: Jim O'Regan <joregan@kth.se> * currency Signed-off-by: Jim O'Regan <joregan@kth.se> * more inflection Signed-off-by: Jim O'Regan <joregan@kth.se> * get the abbreviation expanded as letters for non-deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * working now, add a comment Signed-off-by: Jim O'Regan <joregan@kth.se> * also integer, and preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * also accept the full words Signed-off-by: Jim O'Regan <joregan@kth.se> * deduplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * reorder to make a bit more sense Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * commenting out weighted part makes this work Signed-off-by: Jim O'Regan <joregan@kth.se> * duplicate space Signed-off-by: Jim O'Regan <joregan@kth.se> * partially adapted money verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * actually saving the adaptations Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add time_zone data (copy from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * delete commented code, irrelevant here Signed-off-by: Jim O'Regan <joregan@kth.se> * small modifications, still thinking about how to tackle this Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * change year of copyright in empty files, they aren't eligible anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix missing tabs Signed-off-by: Jim O'Regan <joregan@kth.se> * remove pynini checks from tests Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for measure (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for telephone (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * add verbaliser for time (unadapted from de) Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment everything. yolo. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cache dir Signed-off-by: Jim O'Regan <joregan@kth.se> * tagger for telephone (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic tests (native verified) Signed-off-by: Jim O'Regan <joregan@kth.se> * add components for read digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example with a different separator Signed-off-by: Jim O'Regan <joregan@kth.se> * start adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * add 2-digit area codes Signed-off-by: Jim O'Regan <joregan@kth.se> * add another Signed-off-by: Jim O'Regan <joregan@kth.se> * add Bp to area codes, no need to be that specific Signed-off-by: Jim O'Regan <joregan@kth.se> * export var Signed-off-by: Jim O'Regan <joregan@kth.se> * in progress Signed-off-by: Jim O'Regan <joregan@kth.se> * country codes Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy/paste errors abound Signed-off-by: Jim O'Regan <joregan@kth.se> * put in a function rather than duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * nominal digits Signed-off-by: Jim O'Regan <joregan@kth.se> * add IP prompt Signed-off-by: Jim O'Regan <joregan@kth.se> * add google copyright notice; probably meaningless Signed-off-by: Jim O'Regan <joregan@kth.se> * more work on telephone Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * minor adaptation; more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * replace time verbaliser with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt more Signed-off-by: Jim O'Regan <joregan@kth.se> * nearly there Signed-off-by: Jim O'Regan <joregan@kth.se> * replace with version from sv Signed-off-by: Jim O'Regan <joregan@kth.se> * extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add an IP test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add a couple more ordinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * move variables Signed-off-by: Jim O'Regan <joregan@kth.se> * filter ordinals Signed-off-by: Jim O'Regan <joregan@kth.se> * basic fraction tests Signed-off-by: Jim O'Regan <joregan@kth.se> * . and / both clash, so only make year optional if it is not deterministic Signed-off-by: Jim O'Regan <joregan@kth.se> * using the other word for two, that test cannot pass Signed-off-by: Jim O'Regan <joregan@kth.se> * numerator and denominator can compound; qdd minus Signed-off-by: Jim O'Regan <joregan@kth.se> * form fractionals in ordinal, because something about bare_ordinals does not work when exported Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test, including spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, not in reality Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy fraction symbols from es Signed-off-by: Jim O'Regan <joregan@kth.se> * copy two lines from es to handle faction symbols Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test for that Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * ah, I was forgetting to delete preserve order Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add pieces from swedish itn, adapted Signed-off-by: Jim O'Regan <joregan@kth.se> * add a function to give from/to minutes for 15/30/45 subdivision Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions, but some pieces came from ITN, so are backwards Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, should change the quarter word to a cardinal, or something Signed-off-by: Jim O'Regan <joregan@kth.se> * swapping order Signed-off-by: Jim O'Regan <joregan@kth.se> * more swapping Signed-off-by: Jim O'Regan <joregan@kth.se> * remove import Signed-off-by: Jim O'Regan <joregan@kth.se> * add an example Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change some things Signed-off-by: Jim O'Regan <joregan@kth.se> * some things fixed Signed-off-by: Jim O'Regan <joregan@kth.se> * more adjustments to time Signed-off-by: Jim O'Regan <joregan@kth.se> * more todo, but working for this subset Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more time Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing endings Signed-off-by: Jim O'Regan <joregan@kth.se> * sort|uniq Signed-off-by: Jim O'Regan <joregan@kth.se> * timezone can be inflected too Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sparrowhark test (todo) Signed-off-by: Jim O'Regan <joregan@kth.se> * add test_cases_word (copy from sv) Signed-off-by: Jim O'Regan <joregan@kth.se> * add some word cases with Hungarian accents Signed-off-by: Jim O'Regan <joregan@kth.se> * add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth Signed-off-by: Jim O'Regan <joregan@kth.se> * fix the commented ITN part Signed-off-by: Jim O'Regan <joregan@kth.se> * add hu Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases for the last two parts Signed-off-by: Jim O'Regan <joregan@kth.se> * fix measure cardinals Signed-off-by: Jim O'Regan <joregan@kth.se> * a couple more tests, last still not working Signed-off-by: Jim O'Regan <joregan@kth.se> * missed removing preserver_order Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * codeql Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the variables I may wish to use later (codeql) Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix decimals Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * incorporate feedback from @Laszlo-Weber Signed-off-by: Jim O'Regan <joregan@kth.se> * bare minimum tests + fix verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add öre (also for NOK) Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> * Comment line, for now Signed-off-by: Jim O’Regan <joregan@kth.se> * try breaking this into pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing __init__.py Signed-off-by: Jim O'Regan <joregan@kth.se> * revert 0c6823e111a876495702d347cf7b347106388ed4 Signed-off-by: Jim O'Regan <joregan@kth.se> * fix a bug in cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * at no point is 000 being deleted; probably why the tests are weird Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert a0d031a861fcd7b5750027f2887f3344f39b6616 Signed-off-by: Jim O'Regan <joregan@kth.se> * add more spaced alternatives to the non-deterministic cases Signed-off-by: Jim O'Regan <joregan@kth.se> * add the hyphen before or-ing with 000 Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change money handling to keep sparrowhawk happy Signed-off-by: Jim O'Regan <joregan@kth.se> * add [be]os_or_space Signed-off-by: Jim O'Regan <joregan@kth.se> * try just rewriting the offending pieces to see if they are coming from here Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se& 8D47 gt; * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try just rewriting the offending pieces to see if they are coming from here" This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0. Signed-off-by: Jim O'Regan <joregan@kth.se> * add extra spaced versions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try here Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok... seems to not be happening here either Revert "try here" This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2. Signed-off-by: Jim O'Regan <joregan@kth.se> * try moving a test to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * try duplicating to see if it fails twice Signed-off-by: Jim O'Regan <joregan@kth.se> * Ok, fails both times Revert "try duplicating to see if it fails twice" This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a. Signed-off-by: Jim O'Regan <joregan@kth.se> * 1 fails in some places, 2 in others, so add 2 here and see if that also fails Signed-off-by: Jim O'Regan <joregan@kth.se> * see if this makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * It does not Revert "see if this makes a difference" This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28. Signed-off-by: Jim O'Regan <joregan@kth.se> * rewrite regex to silence deprecation warning Signed-off-by: Jim O'Regan <joregan@kth.se> * REVERTME: change to see what is happening Signed-off-by: Jim O'Regan <joregan@kth.se> * that missing bracket cannot have been good Signed-off-by: Jim O'Regan <joregan@kth.se> * no difference, try just deleting leading zero Signed-off-by: Jim O'Regan <joregan@kth.se> * try again Signed-off-by: Jim O'Regan <joregan@kth.se> * move that thing, merge some lines Signed-off-by: Jim O'Regan <joregan@kth.se> * at least it fails quickly Signed-off-by: Jim O'Regan <joregan@kth.se> * export original Signed-off-by: Jim O'Regan <joregan@kth.se> * move things around for no real reason Signed-off-by: Jim O'Regan <joregan@kth.se> * add in the clean_cardinal from the tutorial Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "add in the clean_cardinal from the tutorial" This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac. Signed-off-by: Jim O'Regan <joregan@kth.se> * try this again Signed-off-by: Jim O'Regan <joregan@kth.se> * pretty sure this should work. As should the other Signed-off-by: Jim O'Regan <joregan@kth.se> * comment the ugly kludges to make them easier to remove. They do not work anyway Signed-off-by: Jim O'Regan <joregan@kth.se> * ok, try here Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "rewrite regex to silence deprecation warning" This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "REVERTME: change to see what is happening" This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6. Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * export unfiltered version of cardinal graph Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of duplicate input print Signed-off-by: Jim O'Regan <joregan@kth.se> * BUGHUNT: check if string has been escaped Signed-off-by: Jim O'Regan <joregan@kth.se> * changing variable, because I am getting tired of looking at that overly long name Signed-off-by: Jim O'Regan <joregan@kth.se> * try deleting the normaliser to see if that makes any difference Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "BUGHUNT: check if string has been escaped" This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056. Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "try deleting the normaliser to see if that makes any difference" This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5. Signed-off-by: Jim O'Regan <joregan@kth.se> * moving globals into __init__ fixes the problem Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_sparrowhawk_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> * prompt: is not part of the ontology sparrowhawk recognises Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * these two now conflict Signed-off-by: Jim O'Regan <joregan@kth.se> * rearrange slightly Signed-off-by: Jim O'Regan <joregan@kth.se> * Update telephone.py remove unused import Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <jaoregan@tcd.ie> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es bugfix (#59) * improve shortest path for decimals and currency Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix sh tn test files for telephone Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * replace non-breaking space Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve ambiguous test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine weights for decimal Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * improve testing when there are multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * revert ES TN for measures with mixed fractions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix formatting Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add comment for testing multiple shortest paths Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Store input_case in Normalizer (#65) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information