Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default TIF, not PDF, for Reliability. #58

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NickDaly
Copy link

  1. Previously, we used PDFs to store the data, which zbarimg can't interpret reliably. Now, we store the data in multi-page TIF files which are significantly smaller files that zbarimg can reliably read. The critical change is on line 1281, where I specified format="tiff" and a compression scheme.

  2. Additionally, I changed the default output. Now, instead of printing to stdout by default, we save to a file. This saves the user from needing to pick the file extension and unintentionally write to unreliable PDFs. Now, you must use "-" to manually specify outputting to stdout. The critical change is on line 1110.

With these two changes, I'm able to write and restore 750KB backups (bz2 compressed to 200KB) in 40 pages. The rest of the changes are related to renaming variables to be more generic and fixing the relevant help-text.

I recommend testing these modes with this backup command:

rm qr-backup.log
time python3 ./qr-backup -v --instructions cover ~/test.tar.bz2 \
2>&1 | tee qr-backup.log

And this restore command:

rm qr-restore.log
time python3 ./qr-backup -v --restore ~/test.tar.bz2.qr.tif -o \
~/test.tar.bz2 2>&1 | tee qr-restore.log

1. Previously, we used PDFs to store the data, which `zbarimg` can't
interpret reliably.  Now, we store the data in multi-page TIF files
which are significantly smaller files that `zbarimg` can reliably
read.  The critical change is on line 1281, where I specified
`format="tiff"` and a compression scheme.

2. Additionally, I changed the default output.  Now, instead of printing
to stdout by default, we save to a file.  This saves the user from
needing to pick the file extension and unintentionally write to
unreliable PDFs.  Now, you must use "-" to manually specify outputting
to stdout.  The critical change is on line 1110.

With these two changes, I'm able to write and restore 750KB
backups (bz2 compressed to 200KB) in 40 pages.  The rest of the
changes are related to renaming variables to be more generic and
fixing the relevant help-text.

I recommend testing these modes with this backup command:

    rm qr-backup.log
    time python3 ./qr-backup -v --instructions cover ~/test.tar.bz2 \
    2>&1 | tee qr-backup.log

And this restore command:

    rm qr-restore.log
    time python3 ./qr-backup -v --restore ~/test.tar.bz2.qr.tif -o \
    ~/test.tar.bz2 2>&1 | tee qr-restore.log
@NickDaly NickDaly changed the title Default TIF not PDF, for Reliability. Default TIF, not PDF, for Reliability. Oct 19, 2024
@za3k
Copy link
Owner

za3k commented Oct 19, 2024

Thanks for the PR. I have a few concerns here.

  1. To start with, can you provide some hard data on restore success rates before and after this change? I want to, at minimum, make sure this is a reliability improvement. I've seen a lot of "I changed some random stuff and now it works" suggestions around zbarimg.
  2. Can you test with a version of zbarimg that includes my own hack fix mchehab/zbar@baeaa28 ? (edit: though since not everyone has this version, the change may still be worthwhile--just for info)
  3. I'm not sure if people know how to print a multi-page tif file. Many default print programs will print only the first page in a multi-page image (gif or tiff).
  4. PDFs are not actually unreliable. zbarimg is. I'd rather fix zbarimg than qr-backup, if feasible.

tmp_pdf = io.BytesIO()
pages[0].save(tmp_pdf, format="pdf", save_all=True, append_images=pages[1:], resolution=dpi, producer="qr-backup", title=f"qr-backup paper backup of {restore_file} with sha256 {sha256sum} and length {original_len}", creationDate=backup_date, modDate=backup_date)
tmp_file = io.BytesIO()
pages[0].save(tmp_file, format="tiff", compression="tiff_lzw", save_all=True, append_images=pages[1:], resolution=dpi, producer="qr-backup", title=f"qr-backup paper backup of {restore_file} with sha256 {sha256sum} and length {original_len}", creationDate=backup_date, modDate=backup_date)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that the LZW patent has expired, so this should be okay for open-source

@NickDaly
Copy link
Author

Thanks for taking a look. I agree that fixing zbarimg is the best fix and I'll look into that. I'll also test with your zbarimg fix to verify whether that addresses the issue (I was testing with Debian Stable's zbarimg v0.23.92). I'll plan on providing an update before November.

@za3k
Copy link
Owner

za3k commented Oct 20, 2024

If you can show that 100% of QRs can be read from TIFF, and I find there are no big printer issues, I'll accept this PR.

I realized there's another major advantage, which is that running on debian currently requires a workaround due to the situation with ghostscript+imagemagick on debian.

@NickDaly
Copy link
Author

NickDaly commented Nov 1, 2024

I'll upload the logs when I'm home tonight, but I had interesting results after testing with the updated (higher-resolution) zbar:

  1. The ~50-page PDF restored correctly.
  2. The PDF did not pass the self-test performed during the backup process.
    That self-test step of the backup process took about half an hour and still failed, which is much slower than the actual restore process. I'd need to do more debugging to understand why.

Overall, the process works fine with the updated zbar, even though it doesn't appear to during the backup process.

@za3k
Copy link
Owner

za3k commented Nov 1, 2024

The restore process manually converts the pdfs to images before feeding them to zbar to work around the previous bug. An hour is really long! I should fix that, but it should be a separate bug to keep discussion clear. I plan to be very busy this month, so this is really bad timing for me to code on qr-backup, apologies

@NickDaly
Copy link
Author

NickDaly commented Nov 3, 2024

I think, at this point, we can assume no further changes to qr-backup are needed. The smaller TIF filesize and improved restore speed are nice, but it's probably not worth the complexity for the end-user of switching away from PDF.

The logs I got with my file are attached below, we had 1,300 codes to restore and it took 1,360 codes to do it, so some images didn't read correctly, even with the updated 900px resolution. However, that was within the margin for error. I also added some timestamps: the restore-test took a half-hour, while the restore only took a minute.

Please let me know if you have any further questions.

qr-restore-pdf.log
qr-backup-pdf.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants