You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 6, 2021. It is now read-only.
Basically, do a few sanity checks before starting the app:
do all <source> elements have a <title>?
do all <source> elements have an ID?
do all <e> elements have one <lg> and at least one <mg>?
do all <mg> elements have at least one <tg>?
do all <tg> elements have at least one <t> element?
do all <t> elements with a sources attribute have a unique set of sources?
are all values in the <t sources> attribute valid <source> IDs?
EDIT: I'm pretty sure I can validate a lot of these things by creating an XML schema, and using a schema validator, but... that might be more effort than it's worth.
Print warnings on start up that are LOUDLY logged somewhere.
The text was updated successfully, but these errors were encountered:
I had noticed this in a few cases. Currently, the reason is that some of the comparative matches/mismatches between CW and MD result from the descriptive analysis allowing for two inflected forms belonging to different parts of speech (and lemmas with different parts of speech). Typically, we have an non-base inflected form lexical entry in MD, which matches with a base-form lexical entry in CW. In such a case one would need to different lexical entries. E.g.
MD: atos MD: Have him do something for you. CW: atos+N+AN+Sg CW: atos CW: arrow CW: atos COMP:lemma
This should be resolvable, but may require some thinking on how to produce the appropriate POS and LC info for the MD entries (which doesn't have CW-style LC:s, so I'd have to extract that through linking the lemma from the correct FST analysis of the MD entry with the LC in CW for the lemma.).
WIth some new scripting, dictionary entries end up being matched only if they belong to the same part-of-speech (by exclusion of 'conjugation' class in MD vs. CW comparisons) - the conjugated/inflected forms as now output as separate MD-only dictionary entries. So the 'atos' issue above is no longer an 'issue'.
Ensure mistakes like #93 don't happen again.
Basically, do a few sanity checks before starting the app:
<source>
elements have a<title>
?<source>
elements have an ID?<e>
elements have one<lg>
and at least one<mg>
?<mg>
elements have at least one<tg>
?<tg>
elements have at least one<t>
element?<t>
elements with a sources attribute have a unique set of sources?<t sources>
attribute valid<source>
IDs?EDIT: I'm pretty sure I can validate a lot of these things by creating an XML schema, and using a schema validator, but... that might be more effort than it's worth.
Print warnings on start up that are LOUDLY logged somewhere.
The text was updated successfully, but these errors were encountered: