Default TIF, not PDF, for Reliability. #58

NickDaly · 2024-10-19T19:12:23Z

Previously, we used PDFs to store the data, which zbarimg can't interpret reliably. Now, we store the data in multi-page TIF files which are significantly smaller files that zbarimg can reliably read. The critical change is on line 1281, where I specified format="tiff" and a compression scheme.
Additionally, I changed the default output. Now, instead of printing to stdout by default, we save to a file. This saves the user from needing to pick the file extension and unintentionally write to unreliable PDFs. Now, you must use "-" to manually specify outputting to stdout. The critical change is on line 1110.

With these two changes, I'm able to write and restore 750KB backups (bz2 compressed to 200KB) in 40 pages. The rest of the changes are related to renaming variables to be more generic and fixing the relevant help-text.

I recommend testing these modes with this backup command:

rm qr-backup.log
time python3 ./qr-backup -v --instructions cover ~/test.tar.bz2 \
2>&1 | tee qr-backup.log

And this restore command:

rm qr-restore.log
time python3 ./qr-backup -v --restore ~/test.tar.bz2.qr.tif -o \
~/test.tar.bz2 2>&1 | tee qr-restore.log

1. Previously, we used PDFs to store the data, which `zbarimg` can't interpret reliably. Now, we store the data in multi-page TIF files which are significantly smaller files that `zbarimg` can reliably read. The critical change is on line 1281, where I specified `format="tiff"` and a compression scheme. 2. Additionally, I changed the default output. Now, instead of printing to stdout by default, we save to a file. This saves the user from needing to pick the file extension and unintentionally write to unreliable PDFs. Now, you must use "-" to manually specify outputting to stdout. The critical change is on line 1110. With these two changes, I'm able to write and restore 750KB backups (bz2 compressed to 200KB) in 40 pages. The rest of the changes are related to renaming variables to be more generic and fixing the relevant help-text. I recommend testing these modes with this backup command: rm qr-backup.log time python3 ./qr-backup -v --instructions cover ~/test.tar.bz2 \ 2>&1 | tee qr-backup.log And this restore command: rm qr-restore.log time python3 ./qr-backup -v --restore ~/test.tar.bz2.qr.tif -o \ ~/test.tar.bz2 2>&1 | tee qr-restore.log

za3k · 2024-10-19T19:22:56Z

Thanks for the PR. I have a few concerns here.

To start with, can you provide some hard data on restore success rates before and after this change? I want to, at minimum, make sure this is a reliability improvement. I've seen a lot of "I changed some random stuff and now it works" suggestions around zbarimg.
Can you test with a version of zbarimg that includes my own hack fix mchehab/zbar@baeaa28 ? (edit: though since not everyone has this version, the change may still be worthwhile--just for info)
I'm not sure if people know how to print a multi-page tif file. Many default print programs will print only the first page in a multi-page image (gif or tiff).
PDFs are not actually unreliable. zbarimg is. I'd rather fix zbarimg than qr-backup, if feasible.

za3k · 2024-10-19T19:26:27Z

qr-backup

-    tmp_pdf = io.BytesIO()
-    pages[0].save(tmp_pdf, format="pdf", save_all=True, append_images=pages[1:], resolution=dpi, producer="qr-backup", title=f"qr-backup paper backup of {restore_file} with sha256 {sha256sum} and length {original_len}", creationDate=backup_date, modDate=backup_date)
+    tmp_file = io.BytesIO()
+    pages[0].save(tmp_file, format="tiff", compression="tiff_lzw", save_all=True, append_images=pages[1:], resolution=dpi, producer="qr-backup", title=f"qr-backup paper backup of {restore_file} with sha256 {sha256sum} and length {original_len}", creationDate=backup_date, modDate=backup_date)


Confirmed that the LZW patent has expired, so this should be okay for open-source

NickDaly · 2024-10-19T20:59:32Z

Thanks for taking a look. I agree that fixing zbarimg is the best fix and I'll look into that. I'll also test with your zbarimg fix to verify whether that addresses the issue (I was testing with Debian Stable's zbarimg v0.23.92). I'll plan on providing an update before November.

za3k · 2024-10-20T18:03:16Z

If you can show that 100% of QRs can be read from TIFF, and I find there are no big printer issues, I'll accept this PR.

I realized there's another major advantage, which is that running on debian currently requires a workaround due to the situation with ghostscript+imagemagick on debian.

NickDaly · 2024-11-01T13:41:27Z

I'll upload the logs when I'm home tonight, but I had interesting results after testing with the updated (higher-resolution) zbar:

The ~50-page PDF restored correctly.
The PDF did not pass the self-test performed during the backup process.
That self-test step of the backup process took about half an hour and still failed, which is much slower than the actual restore process. I'd need to do more debugging to understand why.

Overall, the process works fine with the updated zbar, even though it doesn't appear to during the backup process.

za3k · 2024-11-01T15:38:09Z

The restore process manually converts the pdfs to images before feeding them to zbar to work around the previous bug. An hour is really long! I should fix that, but it should be a separate bug to keep discussion clear. I plan to be very busy this month, so this is really bad timing for me to code on qr-backup, apologies

NickDaly · 2024-11-03T22:00:09Z

I think, at this point, we can assume no further changes to qr-backup are needed. The smaller TIF filesize and improved restore speed are nice, but it's probably not worth the complexity for the end-user of switching away from PDF.

The logs I got with my file are attached below, we had 1,300 codes to restore and it took 1,360 codes to do it, so some images didn't read correctly, even with the updated 900px resolution. However, that was within the margin for error. I also added some timestamps: the restore-test took a half-hour, while the restore only took a minute.

Please let me know if you have any further questions.

qr-restore-pdf.log
qr-backup-pdf.log

NickDaly changed the title ~~Default TIF not PDF, for Reliability.~~ Default TIF, not PDF, for Reliability. Oct 19, 2024

za3k reviewed Oct 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default TIF, not PDF, for Reliability. #58

Default TIF, not PDF, for Reliability. #58

NickDaly commented Oct 19, 2024

za3k commented Oct 19, 2024 •

edited

Loading

za3k Oct 19, 2024

NickDaly commented Oct 19, 2024

za3k commented Oct 20, 2024

NickDaly commented Nov 1, 2024

za3k commented Nov 1, 2024

NickDaly commented Nov 3, 2024 •

edited

Loading

Default TIF, not PDF, for Reliability. #58

Are you sure you want to change the base?

Default TIF, not PDF, for Reliability. #58

Conversation

NickDaly commented Oct 19, 2024

za3k commented Oct 19, 2024 • edited Loading

za3k Oct 19, 2024

Choose a reason for hiding this comment

NickDaly commented Oct 19, 2024

za3k commented Oct 20, 2024

NickDaly commented Nov 1, 2024

za3k commented Nov 1, 2024

NickDaly commented Nov 3, 2024 • edited Loading

za3k commented Oct 19, 2024 •

edited

Loading

NickDaly commented Nov 3, 2024 •

edited

Loading