-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt at establishing distinction between -dict-gt-norm- and -gt-norm fails #4
Comments
I tried to debug the xerox script like this:
it seems it should work but also there is a flag diacritic in the lexicon between |
Replace rule:
which implicitly allows for flags.
Something like kaNpat >> kammat |
The present code does not work because there is a contradiction in it. What you have is basically this:
Ie you can't tell it to do one and the same change both in the context of |
what I would like is:
When ‹ӓ› is word initial or follows an alveolar it should become ‹э›. Following a non-alveolar or a soft-sign ‹ь› it should turn to ‹е› AND ь -> 0. |
Don't know, but at least you should fix the regex to say what you want: |
It now says:
but it still does not work |
I have reordered the steps as follows:
and after this change it works in three out of four cases: echo пей+N+Sg+Nom+Indef | hfst-lookup -q src/fst/generator-dict-gt-norm.hfstol
пей+N+Sg+Nom+Indef пӓй 0.000000
echo сэдь+N+Sg+Nom+Indef | hfst-lookup -q src/fst/generator-dict-gt-norm.hfstol
сэдь+N+Sg+Nom+Indef сӓдь 0.000000
echo седей+N+Sg+Nom+Indef | hfst-lookup -q src/fst/generator-dict-gt-norm.hfstol
седей+N+Sg+Nom+Indef седей+N+Sg+Nom+Indef+? inf
echo эрзя+N+Sg+Nom+Indef | hfst-lookup -q src/fst/generator-dict-gt-norm.hfstol
эрзя+N+Sg+Nom+Indef ӓрзя 0.000000 Only the |
I wanted to try a different ordering as well, but got this:
|
Sorry about that, fixed now. |
Now it dies in a different place:
but if I do make distclean, then it dies for lack of a rule to generate the .hfst as given above. |
That seems completely unrelated, I have no idea. Wipe and reclone? |
Four example words have been selected to provide the *e vs *ä distinction found in the manuscript of the monolingual Erzya dictionary by Kuzʹma Abramov.
In the lexc file we have:
‹ӓ› has been declared in twolc
the filter: ‹remove-diaereses-enhancement.regex› looks like this:
So, there are a number of things going on in one place.
Line 1 removes underlying soft sign preceding underlying ӓ and simultaneously replaces underlying ӓ with е. (failure)
Line 2 replaces underlying ‹ӓ› with ‹е›. (partial success)
Line 3 replaces underlying ‹ӓ› with ‹э› following specific consonants. (partial success)
Line 4 replaces underlying ‹ӓ› with ‹э› word-initially. (partial success)
The script
remove-diaereses-enhancement.hfst
is called inlang-myv/src/fst/Makefile.am and lang-myv/src/fst/filters/Makefile.am
The desired result for the four words give above would be:
Analysis
Dict-Generation:
Generation:
Instead, I get:
Analysis
Dict-Generation:
Generation:
The text was updated successfully, but these errors were encountered: