Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source files with wrong encoding #6324

Open
stweil opened this issue Nov 22, 2024 · 9 comments
Open

Source files with wrong encoding #6324

stweil opened this issue Nov 22, 2024 · 9 comments
Labels

Comments

@stweil
Copy link
Member

stweil commented Nov 22, 2024

The latest code contains two property files with ISO-8859-1 encoding:

  • Kitodo/src/main/resources/messages/password_es.properties
  • Kitodo/src/main/resources/messages/errors_de.properties

They were found with find * -type f | xargs file --mime | grep iso-8859-1.

Here is a complete list of all encodings in the current directory:

% find * -type f | xargs file --mime | sed 's/.*charset/charset/' | sort | uniq -c
 115 charset=binary
   2 charset=iso-8859-1
1511 charset=us-ascii
 195 charset=utf-8

So there are already nearly 200 files with UTF-8 encoding (which is fine), and the two files mentioned above are the only ones with wrong encoding.

@stweil stweil added the bug label Nov 22, 2024
@henning-gerhardt
Copy link
Collaborator

You know that file is not a good tool for getting the encoding as file is only checking a small amount of content of a file and after "detecting" the first non ASCII encoding (or what ever the default is) this is reported back and maybe other encodings in the same file get not detected?

ISO-8859-1 for the German and Spanish resources files are may correct as you did not need anything more to display the used characters correct. It even can show that the result of file is not correct in any case.

@stweil
Copy link
Member Author

stweil commented Nov 22, 2024

@matthias-ronge, @joergleh, you contributed the ISO-8851-1 encodings (#5214, #5903). Did you test the messages in Kitodo.Production? Did they look correct in the frontend? Would they look different with UTF-8 encoding?

@henning-gerhardt
Copy link
Collaborator

@matthias-ronge, @joergleh, you contributed the ISO-8851-1 encodings (#5214). Did you test the messages in Kitodo.Production? Did they look correct in the frontend? Would they look different with UTF-8 encoding?

This is not an issue as the files read as an UTF-8 file and as ISO-8859-1 is part of UTF-8 there should no display issues.

@stweil
Copy link
Member Author

stweil commented Nov 22, 2024

This is not an issue as the files read as an UTF-8 file and as ISO-8859-1 is part of UTF-8 there should no display issues.

I'd prefer a test to verify your claim. Only the first 128 characters (ASCII) are identical in both encodings, so there will be differences for umlauts and Spanish characters which are not part of ASCII.

@danilopenagos
Copy link
Contributor

@matthias-ronge, @joergleh, you contributed the ISO-8851-1 encodings (#5214, #5903). Did you test the messages in Kitodo.Production? Did they look correct in the frontend? Would they look different with UTF-8 encoding?

Hi, everyone! The frontend in Spanish display all Spanish characters and messages correctly in the current version we are working with!

@stweil
Copy link
Member Author

stweil commented Nov 26, 2024

That's surprising. I had expected that this subset of Spanish messages is not shown correctly.

@danilopenagos
Copy link
Contributor

danilopenagos commented Nov 26, 2024

That's surprising. I had expected that this subset of Spanish messages is not shown correctly.

These messages are showed in English. Our version is the 3.5. I don't know which encoding file (ISO-8859-1 or UTF-8) this version use.
image

@stweil
Copy link
Member Author

stweil commented Nov 26, 2024

Strange again. Release 3.5.0 should contain the Spanish translation. Are you running a local build or an official release from GitHub?

@solth
Copy link
Member

solth commented Nov 27, 2024

It may be that the installation @danilopenagos refers to uses a custom messages directory containing outdated property files that do not contain the Spanish translations in question and that are used instead of the message files distributed with the official 3.5.0 release. That should be checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants