Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How Can I Adjust Gpageseg Argument in Receipt To Segment Full Line #189

Closed
genzervn opened this issue Mar 9, 2017 · 8 comments
Closed

Comments

@genzervn
Copy link

genzervn commented Mar 9, 2017

Dear everyone,
when I execute command "ocropus-gpageseg" I want the receipt below just segments to one PNG image. and detect only with one line but Currently, it segments to 2 PNG and ocr with 2 line (line 1: GV USED and line 2: 50.00)
image
I also used --maxcolseps = 0 but the result is the same.
My expectation is just segment to 1 PNG and OCR text full line (GV USED 50.00)
Could you please help me to resolve this issue?
PS: The full receipt at below and I want to detect with 1 column. line by line from top to bottom
image

Thanks and regards
Duc Dang

@zuphilip
Copy link
Collaborator

zuphilip commented Mar 9, 2017

There was a very recent fix which could be connected here, see #171 . Do you use the newest version? What is the output of git log in the ocropy directory for you?

@zuphilip
Copy link
Collaborator

zuphilip commented Mar 9, 2017

Okay, now I tried it myself. There is another problem with the reading order.

You can try out the branch https://github.com/tmbdev/ocropy/tree/kassenzettel see also #190. Does this work now as you expecting it?

@genzervn
Copy link
Author

genzervn commented Mar 10, 2017

Thank a lot for your quick supporting (y). As of now the ordering result was better and easy to read so much but honestly it is not my expecting.
Expecting result in the body:
...
MELANGE OTH 59.00
TOTAL 65.00
GST ON GS 4.25
Gv USED 50.00
...
But actual result
....
59.OO
MELANGE OTH
TOTAL
65.00
GST ON GS
4.25
Gv USED
50.00
...
Seem that it still separates each line to 2 images. but the ordering now is pretty good. do you have any suggestion to reach my expectation?

Thanks
Duc Dang

@zuphilip
Copy link
Collaborator

Are you actually interested in the text or the images for the lines?

The ordering of the textual parts on the same line seems still somtimes to be incorrect. I will look into this.

However, the splitting into two lines, because between the two textual parts there is a lot of spaces, seems more hardcoded in the gpageseg. The first 4 lines in your example are just outputed as one line, but the next lines are splitted into two textual parts, see also the debug output:

_lineseeds

I don't know if it is easy to change the algorithmus/code of gpageseqg for your goal. Maybe easier could be to work on the hocr-output.

@genzervn
Copy link
Author

Thanks Philipp,
Whether we have any config to increase space for split part? Which method can I hard-code to change in gpageseg? is it relate to connected component?

Thank you

@zuphilip
Copy link
Collaborator

Whether we have any config to increase space for split part? Which method can I hard-code to change in gpageseg? is it relate to connected component?

I am sorry, but I don't know these questions. You can try to understand better what gpageseg is doing and then play around a little with the code....

@genzervn
Copy link
Author

thank you. I'm trying to understand gpageseg. Thank you so much for your support.

@zuphilip
Copy link
Collaborator

👍 When you have any new insights, then it would be nice, if you share them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants