-
-
Notifications
You must be signed in to change notification settings - Fork 16
Inflection 85: Updating Malayalam grammar.xml and following files #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like good progress. I've added some comments to help guide you to fix some test failures.
എന്റെ,first,singular,genitive | ||
എന്റെത്,first,singular,genitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example, it seems that എന്റെ means "my", and എന്റെത് means "mine". If that is the case, then you want to mirror what pronoun_en.csv says. You would want to use the following:
എന്റെ,first,singular,genitive,dependent
എന്റെത്,first,singular,genitive,independent
You would need to also add the following to grammar.xml.
<category name="determination">
<restrictions>
<restriction name="pos" value="pronoun"/>
<restriction name="case" value="genitive"/>
</restrictions>
<grammeme name="independent"/> <!-- e.g. mine -->
<grammeme name="dependent"/> <!-- e.g. my {object} -->
</category>
Do not use dependency=
because that's for getting the grammemes (the grammatical category values) of the object being possessed, which is not relevant here.
static dialog::DictionaryLookupInflector dictionaryInflector( | ||
util::LocaleUtils::MALAYALAM(), | ||
{ | ||
{CASE_NOMINATIVE, CASE_ACCUSATIVE, CASE_DATIVE, CASE_GENITIVE, CASE_LOCATIVE, CASE_INSTRUMENTAL, CASE_SOCIATIVE}, | ||
{NUMBER_SINGULAR, NUMBER_PLURAL}, | ||
{GENDER_MASCULINE, GENDER_FEMININE, GENDER_NEUTER}, | ||
{FORMALITY_FORMAL, FORMALITY_INFORMAL}, | ||
{CLUSIVITY_INCLUSIVE, CLUSIVITY_EXCLUSIVE}, | ||
{PERSON_FIRST, PERSON_SECOND, PERSON_THIRD}, | ||
{TENSE_PAST, TENSE_PRESENT, TENSE_FUTURE}, | ||
{MOOD_INDICATIVE, MOOD_IMPERATIVE, MOOD_SUBJUNCTIVE} | ||
}, | ||
{}, | ||
true | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a part of the class and not statically declared. It's preferable to minimize the statically allocated objects like this in case we ever want to minimize the memory being used when switching languages in the library.
Also you're missing the part of speech ordering. You should consider adding this as the first line to prioritize nouns first.
{GrammemeConstants::POS_NOUN(), GrammemeConstants::POS_ADJECTIVE(), GrammemeConstants::POS_VERB()},
if (!inflectedOpt.has_value() && enableInflectionGuess) { | ||
const bool isNoun = (posFeatureValue == u"noun"); | ||
const bool isPluralTarget = (numberFeatureValue == NUMBER_PLURAL); | ||
|
||
if (isNoun && isPluralTarget) { | ||
const std::u16string token = displayValue->getDisplayString(); | ||
|
||
std::u16string guessed = token; | ||
if (!token.empty() && token.back() != u'്') { | ||
guessed += u"കൾ"; | ||
} else { | ||
guessed += u"ങ്ങൾ"; | ||
} | ||
|
||
if (guessed != token) { | ||
return new DisplayValue(guessed, constraints); | ||
} | ||
} | ||
} | ||
|
||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indentation looks off. Can you please fix that?
<!-- Genitive forms without dependency --> | ||
<test><source case="genitive" person="first" number="singular">ഞാൻ</source><result>എന്റെ</result></test> | ||
<test><source clusivity="exclusive" case="genitive" person="first" number="plural">ഞങ്ങൾ</source><result>ഞങ്ങളുടെ</result></test> | ||
<test><source clusivity="inclusive" case="genitive" person="first" number="plural">നാം</source><result>നമ്മുടെ</result></test> | ||
<test><source formality="formal" case="genitive" person="second" number="singular">താങ്കൾ</source><result>താങ്കളുടെ</result></test> | ||
|
||
<!-- Indicative (default) --> | ||
<test><source tense="present" mood="indicative" person="third" number="singular">വരിക</source><result>വരുന്നു</result></test> | ||
|
||
<!-- Imperative --> | ||
<test><source mood="imperative" person="second" number="singular" formality="informal">വരിക</source><result>വരു</result></test> | ||
<test><source mood="imperative" person="second" number="singular" formality="formal">വരിക</source><result>വരുക</result></test> | ||
|
||
<!-- Subjunctive --> | ||
<test><source mood="subjunctive" person="third" number="singular">വരിക</source><result>വരുമെന്ന്</result></test> | ||
<test><source mood="subjunctive" person="first" number="singular">ചോദിക്കുക</source><result>ചോദിക്കാമെന്ന്</result></test> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's uncommon to be this precise. Ideally you will want to specify the minimal grammemes. For example, you can pick an initial verb that is already subjunctive, third person, and singular, and then you only inflect it to plural without the other constraints. The DictionaryLookupInflector will typically take care of these other details.
// Add case, number always | ||
addIfNotEmpty(caseFeature); | ||
addIfNotEmpty(numberFeature); | ||
|
||
// Add gender only if pos is pronoun | ||
if (posFeatureValue == u"pronoun") { | ||
addIfNotEmpty(genderFeature); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very common to use. Though I'm a little surprised about the gender being supported by only pronouns. Hopefully that is intentional. In most languages, the gender doesn't apply when the plural form is used. I definitely recommend reusing the string constants available in GrammemeConstants
. So you may want to replace u"pronoun". It makes it much easier when maintaining this code to see which implementations are using this grammeme.
addIfNotEmpty(formalityFeature); | ||
addIfNotEmpty(clusivityFeature); | ||
addIfNotEmpty(personFeature); | ||
addIfNotEmpty(tenseFeature); | ||
addIfNotEmpty(moodFeature); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very uncommon to allow to inflect. Typically it's only used during lemmaless inflection, but not as a user provided constraint. I recommend removing these lines to simplify the usage and maintenance. Inflecting for POS, gender, number, and case are the typical use cases in all of the language implementations.
for (const auto& suffix : suffixes) { | ||
if (displayString.size() > suffix.size() && ends_with(displayString, suffix)) { | ||
return new ::inflection::dialog::SpeakableString(GrammemeConstants::CASE_GENITIVE()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks better. Instead of checking the length twice, how about checking it once, and use std::u16string::ends_with. It's used elsewhere in the code too. That's a part of newer versions of C++.
inflection::dialog::SpeakableString MlCommonConceptFactory::quantifyType( | ||
const ::inflection::dialog::SpeakableString& formattedNumber, | ||
const SemanticFeatureConceptBase& semanticConcept, | ||
bool useDefault, | ||
Plurality::Rule countType) const | ||
{ | ||
::std::unique_ptr<::inflection::dialog::SpeakableString> speakableResult; | ||
if (!useDefault) { | ||
::std::unique_ptr<SemanticFeatureConceptBase> semanticConceptClone(npc(semanticConcept.clone())); | ||
|
||
if (Plurality::Rule:: countType) { | ||
semanticConceptClone->putConstraint(*npc(semanticFeatureCount), GrammemeConstants::NUMBER_SINGULAR()); | ||
} else { | ||
semanticConceptClone->putConstraint(*npc(semanticFeatureCount), GrammemeConstants::NUMBER_PLURAL()); | ||
} | ||
|
||
speakableResult.reset(semanticConceptClone->toSpeakableString()); | ||
} | ||
|
||
if (speakableResult == nullptr) { | ||
speakableResult.reset(semanticConcept.toSpeakableString()); | ||
} | ||
|
||
return quantifiedJoin(formattedNumber, *npc(speakableResult.get()), {}, countType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the default implementation. I suspect that this method be deleted, and this file should look similar to EnCommonConceptFactory.cpp
.
Updating Malayalam grammar.xml and following file changes from this second time around will be on this branch to consolidate all changes across all files as suggested per George's email