Releases: mittagessen/kraken
Releases · mittagessen/kraken
4.3.0
What's Changed
- Pretraining has been reimplemented to be more faithful to the original publication for more stable memory consumption and easier hyperparameter selection
- Learning rate warmup and backbone freezing in recognition training with
--warmup
and--freeze-backbone
(mostly to enable fine-tuning pretrained models) - Enable
ketos compile
to create precompiled datasets with lines without a corresponding transcription with the--keep-empty-lines
switch (mostly for pretraining models). --failed-sample-threshold
in training modules, aborting training after a certain number of samples failed to load- tensorboard logging with
--logger/--log-dir
options - Change codec construction during training when training and validation dataset alphabets don't match. Prior code points that only exist in the validation set would be copied to the model codec. Now the model codec only contains trained code points.
- Replace
ocr_record
with new smart classesBaselineOCRRecord
andBBoxOCRRecord
. These keep track of reading/display order, compute bounding polygons from the whole line bounding polygon, and average confidences when slicing. - ALTO parsing now deals with any reasonable PointsType (see altoxml/schema#49)
- The fallback line orientation heuristic now takes into account the principal text orientation defined with
--text-direction
instead of assuming horizontal lines (--text-direction horizontal-lr/-rl
). - Baseline segmentation now supports padding of input images with
--pad
. - CLI now allows serialization with custom jinja2 templates through the
--template
option. - Switch validation metrics computation to torchmetrics.
- Various bugfixes, mostly to deal with shapely shenanigans.
Thanks
- @sixtyfive, @anutkk, @stweil, @colibrisson, @PonteIneptique for their contributions to this release.
Full Changelog: 4.2.0...4.3.0
4.1.2
3.0.6
This is mainly a bugfix release containing small improvements such as additional tests, typing, spelling corrections, additional contrib scripts, and fixes for rarely used functionality.
Bugfixes
- Orthography and missing help messages in the CLI drivers
- Documentation for batch input specifications
- Fix a regression in early stopping when training on GPU
- Fix a regression in polygonization in the presence of regions
- Do not duplicate regions during serialization
- Add dummy String beneath
TextLine
w/o text in ALTO to avoid standard-violating emptyTextLine
s - The codec loading functionality of
ketos train
andKrakenTrainer
actually loads a given codec now. - Fall back to simple scaling when centerline dewarping fails
- Drop (duplicate) short option form -p for --pad in all ketos commands
Features
- The forced alignment script
contrib/forced_alignment_overlay.py
now preserves the input file and only replaces the character cuts. - Add reading order tests
- Explicit model sanity checks in blla.segment()
- Add baseline offset options to repolygonization script
- Make codec self-synchronizing
- Add
TextEquiv
forWord
andTextLine
in PAGE XML output - Raise PIL image size limit to 20k*20k image dimensions