Check validity of dictionary when starting up #94

eddieantonio · 2018-12-12T16:29:51Z

Ensure mistakes like #93 don't happen again.

Basically, do a few sanity checks before starting the app:

do all <source> elements have a <title>?
do all <source> elements have an ID?
do all <e> elements have one <lg> and at least one <mg>?
do all <mg> elements have at least one <tg>?
do all <tg> elements have at least one <t> element?
do all <t> elements with a sources attribute have a unique set of sources?
are all values in the <t sources> attribute valid <source> IDs?

EDIT: I'm pretty sure I can validate a lot of these things by creating an XML schema, and using a schema validator, but... that might be more effort than it's worth.

Print warnings on start up that are LOUDLY logged somewhere.

The text was updated successfully, but these errors were encountered:

aarppe · 2018-12-12T17:35:48Z

I had noticed this in a few cases. Currently, the reason is that some of the comparative matches/mismatches between CW and MD result from the descriptive analysis allowing for two inflected forms belonging to different parts of speech (and lemmas with different parts of speech). Typically, we have an non-base inflected form lexical entry in MD, which matches with a base-form lexical entry in CW. In such a case one would need to different lexical entries. E.g.

MD: atos MD: Have him do something for you. CW: atos+N+AN+Sg CW: atos CW: arrow CW: atos COMP:lemma

This should be resolvable, but may require some thinking on how to produce the appropriate POS and LC info for the MD entries (which doesn't have CW-style LC:s, so I'd have to extract that through linking the lemma from the correct FST analysis of the MD entry with the LC in CW for the lemma.).

aarppe · 2018-12-20T03:17:32Z

WIth some new scripting, dictionary entries end up being matched only if they belong to the same part-of-speech (by exclusion of 'conjugation' class in MD vs. CW comparisons) - the conjugated/inflected forms as now output as separate MD-only dictionary entries. So the 'atos' issue above is no longer an 'issue'.

aarppe · 2018-12-20T03:18:30Z

Checking validity of XML source and delivering warnings of bad structure to appropriate location/email is a very desirable feature.

eddieantonio added the enhancement New feature or request label Dec 12, 2018

This was referenced Dec 20, 2018

Dictionary generation creates an empty <tg> for some entries #104

Open

Entry presentation issues #105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check validity of dictionary when starting up #94

Check validity of dictionary when starting up #94

eddieantonio commented Dec 12, 2018 •

edited

Loading

aarppe commented Dec 12, 2018

aarppe commented Dec 20, 2018

aarppe commented Dec 20, 2018

Check validity of dictionary when starting up #94

Check validity of dictionary when starting up #94

Comments

eddieantonio commented Dec 12, 2018 • edited Loading

aarppe commented Dec 12, 2018

aarppe commented Dec 20, 2018

aarppe commented Dec 20, 2018

eddieantonio commented Dec 12, 2018 •

edited

Loading