Skip to content
This repository has been archived by the owner on May 6, 2021. It is now read-only.

Remove entries with no translations/definitions #209

Open
eddieantonio opened this issue Jan 14, 2020 · 1 comment
Open

Remove entries with no translations/definitions #209

eddieantonio opened this issue Jan 14, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@eddieantonio
Copy link
Member

While fixing #208, I found the following entry:

<e>
   <lg>
      <l pos="V">mamisim&#234;w</l>
      <lc>VTA-1</lc>
      <stem>mamisim-</stem>
   </lg>
</e>

It is missing <mg> and <tg> and <t> elements. It should have at least one <t> element with sources marked.

@aarppe, any ideas why this happened?

@eddieantonio eddieantonio added the bug Something isn't working label Jan 14, 2020
@aarppe
Copy link
Contributor

aarppe commented Jan 14, 2020

My best guess, without looking at the GAWK code, is that the weird character encoding &#234 results in mamisim&#234;w not matching any lemma in the comparison CSV file, or only matching partially, so then the AWK script ends up not outputting any glosses. The alternative reason might be that the comparison file has an usual comparison class 'Err/Orth' which doesn't toggle any of the standard output actions, so no is matched.

mamisimew He keeps telling on him repeatedly. mamisimêw+V+TA+Ind+Prs+3Sg+4Sg/PlO mamisimêw s/he tells on s.o., s/he tattle on s.o., s/he rats on s.o. mamisimêw Err/Orth
The dictionary entry, English gloss, and inflectional code for that case is

mamisimêw ᒪᒥᓯᒣᐤ VTA-1 s/he tells on s.o., s/he tattle on s.o., s/he rats on s.o.

On anything on the longer term, I think it'd be best that we operate with Arok's toolbox source file (which we've converted into CSV format and from that in the XML format following the Norwegian tradition, and being less documented), as that gives us also other information (like sources, and the derivation that we've excluded from XML) - unless we ourselves need the XML structure for anything. I suppose we could still support XML as well as the source (for some other language), as long as that source is well-formed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants