You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although this is more of an issue for the private altlab repository, it still affects how the importjson has to be generated so I'm adding it here (also to reference the issue from the code).
morphodict makes some uniqueness assumptions about the structure of the importjson file:
The latter two are not explicitly shown in the importjson, but are calculated and generated by morphodict at import time from either the semantic definition of a wordform if present, or the definition otherwise. Usually this is not a problem. But when it is, it manifests itself as a cryptic failure of UNIQUE constraints when running the importjsondict management command. It is hard to directly see what the problem is then.
Identify the problem. Problem has been clearly identified: if two importjsondict entries match to the same Wordform, the import-time safeguards that avoid duplicates for keywords are bypassed. This currently only happens when processing an entry that is a "formOf" another entry. The lifting of safeguards does not immediately trigger a failure, which only arises when the two entries share a keyword. This had not happened yet, but there was the risk of it eventually happening in CW: one entry in CW imports without safeguards (nisôkan as formOf misôkan@ndi) but, since no keywords are shared between the senses of their definitions , the process does not fail. Now that we are merging extra entries from the other sources, I identified one entry in AECD imports without safeguards AND has shared keywords, cîscahisîpwâkanis, which generates an importjson that canot be imported.
Make crk-db check that the generated importjson file is well formed and will not trigger this kind of failure. The process of generating the database should include safeguards to avoid import failures and to allow for linguists to debug the database and decide on the outcome when the data is inconsistent.
Decide whether the import process in morphodict should be changed to also ensure that the uniqueness restrictions for keywords do not stop the alter process. My take is that we should keep the import process unchanged, but I'm putting this out as an option for discussion if others feel it is really necessary to do that instead.
Fix the AECD entries for cîscahisîpwâkanis: The senses have opposite definitions, so my guess is that one of the entries is wrong and should be removed. However, that is a linguist decision.
The text was updated successfully, but these errors were encountered:
Although this is more of an issue for the private
altlab
repository, it still affects how theimportjson
has to be generated so I'm adding it here (also to reference the issue from the code).morphodict
makes some uniqueness assumptions about the structure of theimportjson
file:slug
must be uniqueTargetLanguageKeyword
must be unique per Wordform.SourceLanguageKeyword
must be unique per Wordform.The latter two are not explicitly shown in the
importjson
, but are calculated and generated bymorphodict
at import time from either the semantic definition of a wordform if present, or the definition otherwise. Usually this is not a problem. But when it is, it manifests itself as a cryptic failure ofUNIQUE
constraints when running theimportjsondict
management command. It is hard to directly see what the problem is then.Identify the problem.Problem has been clearly identified: if twoimportjsondict
entries match to the same Wordform, the import-time safeguards that avoid duplicates for keywords are bypassed. This currently only happens when processing an entry that is a "formOf" another entry. The lifting of safeguards does not immediately trigger a failure, which only arises when the two entries share a keyword. This had not happened yet, but there was the risk of it eventually happening in CW: one entry in CW imports without safeguards (nisôkan
as formOfmisôkan@ndi
) but, since no keywords are shared between the senses of their definitions , the process does not fail. Now that we are merging extra entries from the other sources, I identified one entry in AECD imports without safeguards AND has shared keywords,cîscahisîpwâkanis
, which generates animportjson
that canot be imported.Make. The process of generating the database should include safeguards to avoid import failures and to allow for linguists to debug the database and decide on the outcome when the data is inconsistent.crk-db
check that the generatedimportjson
file is well formed and will not trigger this kind of failuremorphodict
should be changed to also ensure that the uniqueness restrictions for keywords do not stop the alter process. My take is that we should keep the import process unchanged, but I'm putting this out as an option for discussion if others feel it is really necessary to do that instead.cîscahisîpwâkanis
: The senses have opposite definitions, so my guess is that one of the entries is wrong and should be removed. However, that is a linguist decision.The text was updated successfully, but these errors were encountered: