-
Notifications
You must be signed in to change notification settings - Fork 0
Dictionary generation: Maskwacîs dictionary entries lack <lc>
, and that messes up paradigm generation
#120
Comments
If a lemma is solely in MD, that source has no subtype of verb or noun, nor any of their inflectional subtypes. Those can be added, but manually. So, MD only provides info on whether smth is a verb or noun, but not VTA nor NA, nor VTA-2 nor NA-4w. What exactly does |
Ideally, we'd add noun animacy to all of the lemmas unique to MD, even if it has to be done manually; however...
HOWEVER, as noted in the main issue, this particular form, "ayahciyiniw", exists as two entries: one with an empty |
I’ve taken In principle, one could even have two lemmas with the same part of speech, but with different inlflectional classes. Can’t come up with any good example here. What I should be asking is the degree of specificity that is explicutly needed in how NDS is coded? Seems to me that the paradigm and layout files in their preamble specs need to match the linguistic analysis, so for animate nouns N and A, and the inflectional subtype associated with each lemma, listed in the preamble, like NA-1 or NA-4w, only need to be matched by the lemma in the XML source (potentially disambiguating otherwise similar items), and these inflectional subtypes do not influence paradigm generation. The discrepancies intressiä XML are likely an issue of ambiguity in the comparison files and their linking with the original dictionary sources, and I think they’ll be best resolved by a single database for all sources (eventually). |
It's decided by the YAML header in the paradigm files. So, it's as specific as it is in the paradigm YAML. This could probably done in a more straightforward way. I don't fully understand how this work either :( I think it can be inferred from the linguistic analysis, for Plains Cree? |
Note: having 'stranger' twice is due to an error in CW source in providing that English translation twice for 'ayahciyiniw' - so it's not a matter of the script making an error. Also, that the MD and CW entries are presented separately is due to 'ayahciyiniw' missing from the comparison file. So issues with the source materials, for now, rather than the scripts. |
Okay! I'm going to take this issue off of "stable version", as there are a number of real TODOs that need to be addressed before we even get to finish this one! Also, "lexical category" makes way more sense! |
Spun off from #117.
Some entries lack a value for
<lc>
. This is needed to do smart things when generating the search page, and required to generate a paradigm. For example, "ayahciyiniw":Note, this exists as a SEPARATE
<e>
tag, with all<t>
source as "Cree : Words":Resolving this will fix help resolve #117.
Possibly related to #104.
The text was updated successfully, but these errors were encountered: