-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ketos linegen CLI -d is ambiguous #306
Comments
The module hasn't been touched in a long time and should definitely be revisited. At least with the older shallow network architecture synthetic data didn't actually work in improving or even bootstrapping a rough working model. |
Ah, good to know. And that applies to handwriting, or print, or both? Also, what exactly is shallow for you here (or what is deep)? For example, Tesseract's default
Assuming I got that right, where would Kraken's old and new default fit in? |
That was only for print and with the non-pytorch single BiLSTM layer model. The current default is Tesseract is a bit weird. After writing the initial VGSL implementation I tried to use tesseract's specs replicating their hyperparameters as much as possible but I never got anything with EDIT: On a test set (print, polytonic Greek, single font, 2.5k lines, binary) I get 99.4% character accuracy with a summarizing layer and 99.7% with the large configuration. |
Interesting, thanks! I did not look much into Tesseract's training procedure yet (good to know). Its many other performance optimizations make it already impossible to precisely compare and reproduce I'm afraid.
Do you mean the "implicit baseline normalization" (described here p.21)? Perhaps other systems either rely on explicit dewarping, or use 2DLSTMs, or simply try to compensate with larger input height? But your last edit suggests you did apply this successfully – so how does it compare to the same config without |
On 21/11/18 04:37AM, Robert Sachunsky wrote:
I did not look much into Tesseract's training procedure yet (good to
know). Its many [other performance
optimizations](tesseract-ocr/tesseract#2339)
make it already impossible to precisely compare and reproduce I'm
afraid
Yeah I was only talking about the training procedure itself. From then
it keeps being weird with their CTC decoder and this thing that's
similar to the many-to-many codecs kraken has but all mixed into one
code blob.
Do you mean the "implicit baseline normalization" (described
[here](https://tesseract-ocr.github.io/docs/das_tutorial2016/6ModernizationEfforts.pdf)
p.21)? Perhaps other systems either rely on explicit dewarping, or use
2DLSTMs, or simply try to compensate with larger input height? But
your last edit suggests you did apply this successfully – so how does
it compare to the same config without `LXysXX`?
They are a bit independent although spatial normalization is probably
one of the ideas behind the summarizing layers. Any architecture with
sufficient power will be able to generalize across baseline deviations
(input height has nothing to do with it). They probably put that in the
presentation because Thomas Breuel was at Google at the time and the old
ocropus had this heuristic `CenterLineNormalizer`. For Tesseract it is a
bit of a mood point I guess as their line extractor is so old it can't
find anything but the straightest of lines anyway.
OCR systems using the baseline paradigm for segmentation get
auto-normalized lines for recognition as you can just map the baseline
into the plane with a piecewise affine transform which works well even
for extreme curvatures while the implicit (or network-internal
approaches such as STNs) have limits.
For comparison, I get (character accuracy on the Greek print set):
* 99.76% with `[1,128,0,1 Ct3,3,16 Gn8 Mp2,2 Ct3,3,32 Gn16 Mp2,2 Ct3,3,48 Gn16 Mp2,2 Ct3,3,64 Gn32 Ct3,3,128 Gn32 S1(1x0)1,3 Lbx96 Do Lbx96 Do Lbx192 Do]`
* 99.24% with `[1,128,0,1 Ct3,3,16 Gn8 Mp2,2 Ct3,3,32 Gn16 Mp2,2 Ct3,3,48 Gn16 Mp2,2 Ct3,3,64 Gn32 Ct3,3,128 Gn32 Lfys48 Lbx96 Do Lbx96 Do Lbx192 Do]`
and the second one converges a lot slower (ca. epoch 50, in contrast to
30 for other architecture).
|
I disagree: if you do normalize ("deslope"/dewarp) the baseline in advance, then the same height contains more information. And if you rely on vertical summarization to do the job implicitly, then you obviously need larger height in the input.
It does not contain that kind of code, though.
Right, but that probably does not matter much, because you can do line detection externally, (and during training you can still augment by warping).
I agree, external/explicit dewarping is probably more robust (but let's see how the new transformer / multi-head self-attention architectures fare).
I see – thanks! (Perhaps the vertical summary could be trained/regulated specially to converge faster?) |
OK, I formulated that badly. For some material larger input heights result in better results (and we've seen that for many Hebrew manuscripts) but I don't believe this to be related to any improved capability to compensate for baseline position. I'm pulling this out of my ass but naïvely I'd expect implicit baseline compensation to improve with additional contextual information and not necessarily just by having the same information with a higher resolution (in fact it could be detrimental as the receptive field of the convolutional stack is limited).
Yes, as I said a lot of the ocropus-y features in that presentation never ended up in Tesseract.
For now they mostly seem to require more training data for the same results with slower inference. At least that's what the literature (and some quick experiments on my side) suggest.
Yeah, I didn't fiddle around with the hyperparameters much. Doing hyperparameter search with kraken is a bit of a pain right now as the datasets are loading so slowly. It is entirely possible that Tesseracts explicit per-layer learning rates beyond what Adam does where added for those layers. But IDK, in the end you can probably get the exact same result with a stack of 1xX convolutional layers when using a fixed input height. |
In
ketos linegen
, you currently have:You might want to rename one, e.g.
-D
.The text was updated successfully, but these errors were encountered: