-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize function words and capitalization in MD and AECD to match conventions followed in CW #121
Comments
It turns out that there are 490 initial word types in the English definitions, which in reverse frequency order are enumerated below. This suggests that we can standardize a large part of the English definitions automatically, and the possible proper names that need to remain capitalized can be reviewed manually.
|
The following script gives the initial words also when splitting definitions into their constituent senses:
|
There was a bug in entry merging when there are notes. This manifested in pinawêw not showing up for "she sheds", as the removal of the note left the entry as "s/hesheds" instead.
(From #99) We would want to implement some form of standardization of the ALTLab versions of the definitions from MD and AECD. For instance, removing the initial article, i.e.
a, an, the
, and lower-casing initial pronouns (e.g.S/he
->s/he
in AECD , perhaps even generalizing the masculineHe
in MD tos/he
as in CW. We might do an initial pass of this programmatically, and always keeping the original definition for reference, but then having a standardized version for public consumption in itwêwina.The text was updated successfully, but these errors were encountered: