Skip to content
This repository has been archived by the owner on May 6, 2021. It is now read-only.

Dictionary generation creates an empty <tg> for some entries #104

Open
eddieantonio opened this issue Dec 20, 2018 · 7 comments
Open

Dictionary generation creates an empty <tg> for some entries #104

eddieantonio opened this issue Dec 20, 2018 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@eddieantonio
Copy link
Member

eddieantonio commented Dec 20, 2018

Some entries have an empty <tg> which gets rendered as an empty list item in itwêwina:

e.g., âhkamêyihtamowin

<e>
    <lg>
        <l pos="N">âhkamêyihtamowin</l>
        <lc>NI-1</lc>
        <stem>âhkamêyihtamowin-</stem>
    </lg>
    <mg>
        <tg xml:lang="eng">
            <t pos="N" sources="MD">Earnestness.</t>
        </tg>
    </mg>
    <mg>
        <tg xml:lang="eng">
        </tg>
    </mg>
</e>

which is rendered as:

captura de pantalla 2018-12-20 a la s 9 56 03 a m

Situations like these can be caught in #94; additionally, itwêwina could refuse to display empty translation groups.

@eddieantonio eddieantonio added the bug Something isn't working label Dec 20, 2018
@eddieantonio eddieantonio changed the title Dictionary generation creates an empty <tg> for some entries Dictionary generation creates an empty <tg> for some entries Dec 20, 2018
@aarppe
Copy link
Contributor

aarppe commented Dec 20, 2018

Looks like the reason for this is that the English gloss from CW was not extracted for 741 cases (even though a classification was done presumably based on consulting the CW source), in many cases likely due to a mismatch with a non-regularized <ý> - at least 638 of the Cree glosses have a <y>. This can be fixed semi-automatically, I think.

@aarppe
Copy link
Contributor

aarppe commented Dec 20, 2018

Brief scripting suggests the missing CW gloss may become solved in 571 cases. This automatic insertion likely needs to be manually verified.

@aarppe
Copy link
Contributor

aarppe commented Jan 8, 2019

I've revised the comparison source file by adding the missing cases when they could be unequivocally found in the CW source (there were only a few missing MD cases, and these were due to incorrect formatting, such as a missing tab).

But there's yet over a 100 missing CW glosses that need to be extracted by hand.

@eddieantonio
Copy link
Member Author

I've revised the comparison source file by adding the missing cases when they could be unequivocally found in the CW source (there were only a few missing MD cases, and these were due to incorrect formatting, such as a missing tab).

But there's yet over a 100 missing CW glosses that need to be extracted by hand.

What do you mean "extracted by hand"?

@aarppe
Copy link
Contributor

aarppe commented Jan 8, 2019

Human verification/analysis. These are cases where I wasn't able to extract from CW source the lemma automatically based on the information that is in the MD vs. CW comparison file. They may be cases where the comparison is done between a lemma and an inflected form (conjugation). So this may require human judgement/verification as to what is the correct CW gloss/lemma match, and whether the CW gloss is actually missing or not.

@aarppe
Copy link
Contributor

aarppe commented Jan 8, 2019

And then a bunch are dependent nouns where the FST analysis has corrupted the comparison file when it has been edited in a spreadsheet (with the previous convention of these dependent nouns having a stem as their lemma marked with an initial hyphen, which the spreadsheet has tried to interpret as a reference, and when that hasn't worked it has been automatically marked by the spreadsheet as #NAME?). So one cannot use the FST analysis field automatically to extract the CW gloss. Based on file history this has happened before last summer (so anytime between then and when the comparison project files were created in August 2016).

@eddieantonio
Copy link
Member Author

Adding to this: "ocêkatâhk"/"Big Dipper" is not present in the engcrk.xml, but it is present in crkeng.xml.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants