8000 Inflection 85: Updating Malayalam grammar.xml and following files by deonajulary06 · Pull Request #138 · unicode-org/inflection · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Inflection 85: Updating Malayalam grammar.xml and following files #138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

deonajulary06
Copy link
Collaborator

Updating Malayalam grammar.xml and following file changes from this second time around will be on this branch to consolidate all changes across all files as suggested per George's email

@deonajulary06 deonajulary06 requested a review from grhoten June 26, 2025 19:54
Copy link
Member
@grhoten grhoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like good progress. I've added some comments to help guide you to fix some test failures.

Comment on lines 4 to 5
എന്റെ,first,singular,genitive
എന്റെത്,first,singular,genitive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, it seems that എന്റെ means "my", and എന്റെത് means "mine". If that is the case, then you want to mirror what pronoun_en.csv says. You would want to use the following:

എന്റെ,first,singular,genitive,dependent
എന്റെത്,first,singular,genitive,independent

You would need to also add the following to grammar.xml.

            <category name="determination">
                <restrictions>
                    <restriction name="pos" value="pronoun"/>
                    <restriction name="case" value="genitive"/>
                </restrictions>
                <grammeme name="independent"/> <!-- e.g. mine -->
                <grammeme name="dependent"/> <!-- e.g. my {object} -->
            </category>

Do not use dependency= because that's for getting the grammemes (the grammatical category values) of the object being possessed, which is not relevant here.

Comment on lines +115 to +129
static dialog::DictionaryLookupInflector dictionaryInflector(
util::LocaleUtils::MALAYALAM(),
{
{CASE_NOMINATIVE, CASE_ACCUSATIVE, CASE_DATIVE, CASE_GENITIVE, CASE_LOCATIVE, CASE_INSTRUMENTAL, CASE_SOCIATIVE},
{NUMBER_SINGULAR, NUMBER_PLURAL},
{GENDER_MASCULINE, GENDER_FEMININE, GENDER_NEUTER},
{FORMALITY_FORMAL, FORMALITY_INFORMAL},
{CLUSIVITY_INCLUSIVE, CLUSIVITY_EXCLUSIVE},
{PERSON_FIRST, PERSON_SECOND, PERSON_THIRD},
{TENSE_PAST, TENSE_PRESENT, TENSE_FUTURE},
{MOOD_INDICATIVE, MOOD_IMPERATIVE, MOOD_SUBJUNCTIVE}
},
{},
true
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a part of the class and not statically declared. It's preferable to minimize the statically allocated objects like this in case we ever want to minimize the memory being used when switching languages in the library.

Also you're missing the part of speech ordering. You should consider adding this as the first line to prioritize nouns first.

            {GrammemeConstants::POS_NOUN(),  GrammemeConstants::POS_ADJECTIVE(), GrammemeConstants::POS_VERB()},

Comment on lines +136 to +156
if (!inflectedOpt.has_value() && enableInflectionGuess) {
const bool isNoun = (posFeatureValue == u"noun");
const bool isPluralTarget = (numberFeatureValue == NUMBER_PLURAL);

if (isNoun && isPluralTarget) {
const std::u16string token = displayValue->getDisplayString();

std::u16string guessed = token;
if (!token.empty() && token.back() != u'്') {
guessed += u"കൾ";
} else {
guessed += u"ങ്ങൾ‍";
}

if (guessed != token) {
return new DisplayValue(guessed, constraints);
}
}
}

return nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indentation looks off. Can you please fix that?

Comment on lines 105 to 120
<!-- Genitive forms without dependency -->
<test><source case="genitive" person="first" number="singular">ഞാൻ</source><result>എന്റെ</result></test>
<test><source clusivity="exclusive" case="genitive" person="first" number="plural">ഞങ്ങൾ</source><result>ഞങ്ങളുടെ</result></test>
<test><source clusivity="inclusive" case="genitive" person="first" number="plural">നാം</source><result>നമ്മുടെ</result></test>
<test><source formality="formal" case="genitive" person="second" number="singular">താങ്കൾ</source><result>താങ്കളുടെ</result></test>

<!-- Indicative (default) -->
<test><source tense="present" mood="indicative" person="third" number="singular">വരിക</source><result>വരുന്നു</result></test>

<!-- Imperative -->
<test><source mood="imperative" person="second" number="singular" formality="informal">വരിക</source><result>വരു</result></test>
<test><source mood="imperative" person="second" number="singular" formality="formal">വരിക</source><result>വരുക</result></test>

<!-- Subjunctive -->
<test><source mood="subjunctive" person="third" number="singular">വരിക</source><result>വരുമെന്ന്</result></test>
<test><source mood="subjunctive" person="first" number="singular">ചോദിക്കുക</source><result>ചോദിക്കാമെന്ന്</result></test>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's uncommon to be this precise. Ideally you will want to specify the minimal grammemes. For example, you can pick an initial verb that is already subjunctive, third person, and singular, and then you only inflect it to plural without the other constraints. The DictionaryLookupInflector will typically take care of these other details.

Comment on lines +92 to +99
// Add case, number always
addIfNotEmpty(caseFeature);
addIfNotEmpty(numberFeature);

// Add gender only if pos is pronoun
if (posFeatureValue == u"pronoun") {
addIfNotEmpty(genderFeature);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very common to use. Though I'm a little surprised about the gender being supported by only pronouns. Hopefully that is intentional. In most languages, the gender doesn't apply when the plural form is used. I definitely recommend reusing the string constants available in GrammemeConstants. So you may want to replace u"pronoun". It makes it much easier when maintaining this code to see which implementations are using this grammeme.

Comment on lines +101 to +105
addIfNotEmpty(formalityFeature);
addIfNotEmpty(clusivityFeature);
addIfNotEmpty(personFeature);
addIfNotEmpty(tenseFeature);
addIfNotEmpty(moodFeature);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very uncommon to allow to inflect. Typically it's only used during lemmaless inflection, but not as a user provided constraint. I recommend removing these lines to simplify the usage and maintenance. Inflecting for POS, gender, number, and case are the typical use cases in all of the language implementations.

Comment on lines 37 to 41
for (const auto& suffix : suffixes) {
if (displayString.size() > suffix.size() && ends_with(displayString, suffix)) {
return new ::inflection::dialog::SpeakableString(GrammemeConstants::CASE_GENITIVE());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better. Instead of checking the length twice, how about checking it once, and use std::u16string::ends_with. It's used elsewhere in the code too. That's a part of newer versions of C++.

Comment on lines 24 to 48
inflection::dialog::SpeakableString MlCommonConceptFactory::quantifyType(
const ::inflection::dialog::SpeakableString& formattedNumber,
const SemanticFeatureConceptBase& semanticConcept,
bool useDefault,
Plurality::Rule countType) const
{
::std::unique_ptr<::inflection::dialog::SpeakableString> speakableResult;
if (!useDefault) {
::std::unique_ptr<SemanticFeatureConceptBase> semanticConceptClone(npc(semanticConcept.clone()));

if (Plurality::Rule:: countType) {
semanticConceptClone->putConstraint(*npc(semanticFeatureCount), GrammemeConstants::NUMBER_SINGULAR());
} else {
semanticConceptClone->putConstraint(*npc(semanticFeatureCount), GrammemeConstants::NUMBER_PLURAL());
}

speakableResult.reset(semanticConceptClone->toSpeakableString());
}

if (speakableResult == nullptr) {
speakableResult.reset(semanticConcept.toSpeakableString());
}

return quantifiedJoin(formattedNumber, *npc(speakableResult.get()), {}, countType);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the default implementation. I suspect that this method be deleted, and this file should look similar to EnCommonConceptFactory.cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0