Inflection 85: Updating Malayalam grammar.xml and following files #138

deonajulary06 · 2025-06-26T19:54:29Z

Updating Malayalam grammar.xml and following file changes from this second time around will be on this branch to consolidate all changes across all files as suggested per George's email

grhoten

This looks like good progress. I've added some comments to help guide you to fix some test failures.

grhoten · 2025-06-27T05:42:05Z

inflection/resources/org/unicode/inflection/inflection/pronoun_ml.csv

+എന്റെ,first,singular,genitive
+എന്റെത്,first,singular,genitive


In this example, it seems that എന്റെ means "my", and എന്റെത് means "mine". If that is the case, then you want to mirror what pronoun_en.csv says. You would want to use the following:

എന്റെ,first,singular,genitive,dependent എന്റെത്,first,singular,genitive,independent

You would need to also add the following to grammar.xml.

<category name="determination"> <restrictions> <restriction name="pos" value="pronoun"/> <restriction name="case" value="genitive"/> </restrictions> <grammeme name="independent"/>  <grammeme name="dependent"/>  </category>

Do not use dependency= because that's for getting the grammemes (the grammatical category values) of the object being possessed, which is not relevant here.

grhoten · 2025-06-27T05:47:51Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_MlDisplayFunction.cpp

+    static dialog::DictionaryLookupInflector dictionaryInflector(
+        util::LocaleUtils::MALAYALAM(),
+        {
+            {CASE_NOMINATIVE, CASE_ACCUSATIVE, CASE_DATIVE, CASE_GENITIVE, CASE_LOCATIVE, CASE_INSTRUMENTAL, CASE_SOCIATIVE},
+            {NUMBER_SINGULAR, NUMBER_PLURAL},
+            {GENDER_MASCULINE, GENDER_FEMININE, GENDER_NEUTER},
+            {FORMALITY_FORMAL, FORMALITY_INFORMAL},
+            {CLUSIVITY_INCLUSIVE, CLUSIVITY_EXCLUSIVE},
+            {PERSON_FIRST, PERSON_SECOND, PERSON_THIRD},
+            {TENSE_PAST, TENSE_PRESENT, TENSE_FUTURE},
+            {MOOD_INDICATIVE, MOOD_IMPERATIVE, MOOD_SUBJUNCTIVE}
+        },
+        {},
+        true
+    );


This should be a part of the class and not statically declared. It's preferable to minimize the statically allocated objects like this in case we ever want to minimize the memory being used when switching languages in the library.

Also you're missing the part of speech ordering. You should consider adding this as the first line to prioritize nouns first.

{GrammemeConstants::POS_NOUN(), GrammemeConstants::POS_ADJECTIVE(), GrammemeConstants::POS_VERB()},

grhoten · 2025-06-27T05:48:45Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_MlDisplayFunction.cpp

+    if (!inflectedOpt.has_value() && enableInflectionGuess) {
+    const bool isNoun = (posFeatureValue == u"noun");
+    const bool isPluralTarget = (numberFeatureValue == NUMBER_PLURAL);
+
+    if (isNoun && isPluralTarget) {
+        const std::u16string token = displayValue->getDisplayString();
+
+        std::u16string guessed = token;
+        if (!token.empty() && token.back() != u'്') {
+            guessed += u"കൾ";
+        } else {
+            guessed += u"ങ്ങൾ‍";
+        }
+
+        if (guessed != token) {
+            return new DisplayValue(guessed, constraints);
+        }
+    }
+}
+
+return nullptr;


This indentation looks off. Can you please fix that?

grhoten · 2025-06-27T05:53:56Z

inflection/test/resources/inflection/dialog/inflection/ml.xml

+<!-- Genitive forms without dependency -->
+<test><source case="genitive" person="first" number="singular">ഞാൻ</source><result>എന്റെ</result></test>
+<test><source clusivity="exclusive" case="genitive" person="first" number="plural">ഞങ്ങൾ</source><result>ഞങ്ങളുടെ</result></test>
+<test><source clusivity="inclusive" case="genitive" person="first" number="plural">നാം</source><result>നമ്മുടെ</result></test>
+<test><source formality="formal" case="genitive" person="second" number="singular">താങ്കൾ</source><result>താങ്കളുടെ</result></test>
+
+<!-- Indicative (default) -->
+<test><source tense="present" mood="indicative" person="third" number="singular">വരിക</source><result>വരുന്നു</result></test>
+
+<!-- Imperative -->
+<test><source mood="imperative" person="second" number="singular" formality="informal">വരിക</source><result>വരു</result></test>
+<test><source mood="imperative" person="second" number="singular" formality="formal">വരിക</source><result>വരുക</result></test>
+
+<!-- Subjunctive -->
+<test><source mood="subjunctive" person="third" number="singular">വരിക</source><result>വരുമെന്ന്</result></test>
+<test><source mood="subjunctive" person="first" number="singular">ചോദിക്കുക</source><result>ചോദിക്കാമെന്ന്</result></test>


It's uncommon to be this precise. Ideally you will want to specify the minimal grammemes. For example, you can pick an initial verb that is already subjunctive, third person, and singular, and then you only inflect it to plural without the other constraints. The DictionaryLookupInflector will typically take care of these other details.

grhoten · 2025-06-27T05:58:19Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_MlDisplayFunction.cpp

+    // Add case, number always
+    addIfNotEmpty(caseFeature);
+    addIfNotEmpty(numberFeature);
+
+    // Add gender only if pos is pronoun
+    if (posFeatureValue == u"pronoun") {
+        addIfNotEmpty(genderFeature);
+    }


This is very common to use. Though I'm a little surprised about the gender being supported by only pronouns. Hopefully that is intentional. In most languages, the gender doesn't apply when the plural form is used. I definitely recommend reusing the string constants available in GrammemeConstants. So you may want to replace u"pronoun". It makes it much easier when maintaining this code to see which implementations are using this grammeme.

grhoten · 2025-06-27T06:01:30Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_MlDisplayFunction.cpp

+    addIfNotEmpty(formalityFeature);
+    addIfNotEmpty(clusivityFeature);
+    addIfNotEmpty(personFeature);
+    addIfNotEmpty(tenseFeature);
+    addIfNotEmpty(moodFeature);


This is very uncommon to allow to inflect. Typically it's only used during lemmaless inflection, but not as a user provided constraint. I recommend removing these lines to simplify the usage and maintenance. Inflecting for POS, gender, number, and case are the typical use cases in all of the language implementations.

grhoten · 2025-06-27T06:04:07Z

inflection/src/inflection/grammar/synthesis/MlGrammarSynthesizer_CaseLookupFunction.cpp

+   for (const auto& suffix : suffixes) {
+       if (displayString.size() > suffix.size() && ends_with(displayString, suffix)) {
+           return new ::inflection::dialog::SpeakableString(GrammemeConstants::CASE_GENITIVE());
+       }
+   }


This looks better. Instead of checking the length twice, how about checking it once, and use std::u16string::ends_with. It's used elsewhere in the code too. That's a part of newer versions of C++.

grhoten · 2025-06-27T06:06:54Z

inflection/src/inflection/dialog/language/MlCommonConceptFactory.cpp

+inflection::dialog::SpeakableString MlCommonConceptFactory::quantifyType(
+    const ::inflection::dialog::SpeakableString& formattedNumber,
+    const SemanticFeatureConceptBase& semanticConcept,
+    bool useDefault,
+    Plurality::Rule countType) const
+{
+    ::std::unique_ptr<::inflection::dialog::SpeakableString> speakableResult;
+    if (!useDefault) {
+        ::std::unique_ptr<SemanticFeatureConceptBase> semanticConceptClone(npc(semanticConcept.clone()));
+
+        if (Plurality::Rule:: countType) {
+            semanticConceptClone->putConstraint(*npc(semanticFeatureCount), GrammemeConstants::NUMBER_SINGULAR());
+        } else {
+            semanticConceptClone->putConstraint(*npc(semanticFeatureCount), GrammemeConstants::NUMBER_PLURAL());
+        }
+
+        speakableResult.reset(semanticConceptClone->toSpeakableString());
+    }
+
+    if (speakableResult == nullptr) {
+        speakableResult.reset(semanticConcept.toSpeakableString());
+    }
+
+    return quantifiedJoin(formattedNumber, *npc(speakableResult.get()), {}, countType);
+}


This looks like the default implementation. I suspect that this method be deleted, and this file should look similar to EnCommonConceptFactory.cpp.

deonajulary06 added 18 commits May 27, 2025 10:16

Add Malayalam dictionary support

e359163

Add Malayalam language support in LocaleUtils.hpp

a109f4e

Add Malayalam locale support to LocaleUtils

9fd9e01

Add Malayalam language to the tests

72e7c28

Add Malayalam locale group ml_IN

95b483d

ADD: Malayalam tokenizer configuration file

681b568

Inflection-85: Add Git LFS config for Malayalam dictionary and XML files

bfa7ae6

Add Malayalam inflection and pronoun tests

5aa7ec4

Updated copyright line

f20350f

Updated copyright message

9cc0a44

Updated copright message

4bb9964

Updated language grammar to include Malayalam

a46c301

Added pronouns for Malayalam

df82a8b

Add ll GrammarSynthesizer files

8fcfb16

Add Malayalam grammar synthesizer

0c50978

Add Malayalam-specific CommonConceptFactory with lists and quantities

9523646

Update document on how to add a new language, fixed errors

a8a7f2d

Updated grammar.xml for Malayalam

e60ea6f

deonajulary06 requested a review from grhoten June 26, 2025 19:54

deonajulary06 added 4 commits June 26, 2025 14:55

Update pronoun_ml.csv

80af5bf

Updated all grammar synthesizer component for Malayalam

a70ad5e

Update Common Concept Factory files

6a191b6

Updated tests for Malayalam

421d0e4

grhoten reviewed Jun 27, 2025

View reviewed changes

deonajulary06 added 2 commits July 12, 2025 21:52

Fix Malayalam grammar synthesis and remove count lookup function

11d35cc

Updated Grammeme Constants files to include sociative case

de22098

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inflection 85: Updating Malayalam grammar.xml and following files #138

Inflection 85: Updating Malayalam grammar.xml and following files #138

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		എന്റെ,first,singular,genitive
		എന്റെത്,first,singular,genitive

Uh oh!

Inflection 85: Updating Malayalam grammar.xml and following files #138

Are you sure you want to change the base?

Inflection 85: Updating Malayalam grammar.xml and following files #138

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!