Vis2 - Image to Text OCR()

Post your working scripts, libraries and tools
Albireo
Posts: 1284
Joined: 16 Oct 2013, 13:53

Re: Vis2 - Image to Text OCR()

03 Sep 2020, 12:37

Have tried to use different files in the directory C:\Temp\OCR_Tesseract\bin\tesseract\tessdata_best
The tessdata files I tested, has the following information .:
tessdata_fast (files)
  • swe.traineddata 13.308kB
  • eng.traineddata 4.017kB
  • Runtime (one row) .: 3.531000 sek.
  • Total Runtime (fifteen rows) .: 51.218000 sek.

tessdata_best (files)
  • swe.traineddata 13.990kB
    eng.traineddata 15.040kB
  • Runtime (one row) .: 3.609000 sek.
  • Total Runtime (fifteen rows) .: 48.016000 sek.

Not much difference in speed ("best" was a bit faster overall)
iseahound wrote:
02 Sep 2020, 12:26
… you may have to change the x,y,w,h values by a pixel or two - that seems to have a large effect on the final outcome...
You were right, suddenly the first lines started working without problems. (y=749 Instead y=750 - 150dpi)
That I chose the right language (swe) was not a disadvantage either (ÅÄÖ began to be interpreted correctly)
Have also tested using several traineddata, like this .: OCR(imgFile, "swe+eng", [x, y, w, h]) (did not get worse)
but, then…

Right now I am trying to get the AHK-program to interpret rows 5,6,7 correctly - but fail (the rows after become correct again)
Thought it would be better with an image of 600 dpi, and to some extent it turned out so. But not on these signs.
F. becomes E and 8 becomes 3 and...
These characters are in the middle of the line - everything is correct before and at the end of the line (but not these two...)

Is there any way to see what is in the selected area (x, y, w, h) ?
The document is written with the font Courier (Courier is monospaced ABC KLM ÅÄÖ - Does not appear correctly here)
How is this font selected in the OCR interpretation? (Is it possible?)
iseahound
Posts: 614
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - Image to Text OCR()

03 Sep 2020, 13:18

Yep. The problems you've bought up are the same ones that I have. Unfortunately, that is the nature of OCR, it is inexact. However, I do believe that there are solutions for this problem, and that requires a substantial rewrite of Vis2.

Vis2 hasn't been properly updated in years, and I intend to keep it that way. I've been focusing hard on building the backbone of a future iteration of some kind of OCR program in AHK, starting with https://github.com/iseahound/ImagePut which provides a graphics translation library.

Likewise there is also AutoHotkey v2 to consider. I like what's going on with the language a lot. Therefore some future program will 100% be written in v2 and backported to v1.

Finally, I have real life to attend to!

On line 2081 try adding the following line below:

Code: Select all

               this.preprocess(screenshot, this.fileProcessedImage)
               Vis2.Graphics.Image.Render(this.fileProcessedImage, 1, 3000) ; new line that shows what is being ocr'd for 3 seconds! If the image is too big, change 1 to 0.5.
Albireo
Posts: 1284
Joined: 16 Oct 2013, 13:53

Re: Vis2 - Image to Text OCR()

03 Sep 2020, 17:28

Great! ( I like it - although the images do not really fit my monitor ;) )

It's not possible to select the font with which the document is written to increase the quality?
What resolution should the image be created in, for best results? (150dpi / 300dpi / 600dpi / 1200dpi / or ...)
Does an image in black and white or grayscale work best to OCR-interpret?

What happens if the image is a scanned invoice? (slightly skewed)

Do not really know how to adjust the row heights. at 1200dpi resolution, only a few pixels distinguish between the "," (comma) line above and the "Å" line below. More space around what should be OCR-interpreted seems to give better results
iseahound
Posts: 614
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - Image to Text OCR()

03 Sep 2020, 17:56

You're not really asking OCR questions. These are how to scan document type questions.

Always: for photos scan at 300 ppi. (We don't use dots per inch anymore, it's pixels per inch.)
Grayscale documents: 600 ppi.

You should scan documents in greyscale. There is an algorithm that automatically converts greyscale to black and white depending on how the brightness and contrast is distributed around the image. Note that the document is upscaled first, then changed into black and white.

If you're scanning an invoice you should do your best to right the image. Skewed images have horrible performance.

No you cannot select the font. There are OCR optimized fonts however.

You can run OCR 3 times, with different row heights, then if there is 2/3 or 3/3 that agree, you take that number.

Those are the only suggestions I can give you. Also you can change the scale to 0.25 or lower depending on the size of your document.
Albireo
Posts: 1284
Joined: 16 Oct 2013, 13:53

Re: Vis2 - Image to Text OCR()

07 Sep 2020, 05:16

Thanks!
Usually the original is file in PDF-format.
That PDF-file is converted to a grayscale image (any resolution) with pdftopng
Have mostly tested the resolutions 150/300/600 ppi (I do not know the difference between dpi / ppi)
Have not experienced any difference between the resolutions, other than that it is easier to distinguish between the rows (more pixels)

When I just select the characters "F." the program can not interpret - (no result)
When even the character before is selected ("DF."). The OCR result is "DE"
Has managed to get the program to interpret "F." in some cases,
but in that case the number 8 has been interpreted as 3 or
the end of the line has not been interpreted at all or
the result has been generated on several lines.

I have no idea why the result is so random or how the result could be improved. (right now it's useless for me)

Have tested many other programs to convert my PDF files to text. These translate the rows perfectly, but the structure makes it more or less impossible to separate the different columns and lacks the ability to select only a certain field to be interpreted
iseahound
Posts: 614
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - Image to Text OCR()

07 Sep 2020, 11:51

Okay. You're not really using Vis2 optimally. Vis2 is designed to be a user friendly way of using OCR via a graphical user interface. You're looking for page segmentation modes in Tesseract:

Code: Select all

Page segmentation method
By default Tesseract expects a page of text when it segments an image. If you're just seeking to OCR a small region, try a different segmentation mode, using the --psm argument. Note that adding a white border to text which is too tightly cropped may also help, see issue 398.

To see a complete list of supported page segmentation modes, use tesseract -h. Here's the list as of 3.21:

  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
			bypassing hacks that are Tesseract-specific.
8 Treat the image as a single word.
10 Treat the image as a single character.
https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md
If you know you will only encounter a subset of the characters available in the language, such as only digits, you can use the tessedit_char_whitelist configuration variable. See the FAQ for an example.
ahk7
Posts: 308
Joined: 06 Nov 2013, 16:35

Re: Vis2 - Image to Text OCR()

07 Sep 2020, 12:30

Afaik tesseract 4 should be better, https://tesseract-ocr.github.io/tessdoc/ReleaseNotes#tesseract-release-notes-oct-29-2018---v400 (current version is V4.1.1)
Vis2 probably uses version 3.x?

Just for reference, https://github.com/jbarlow83/OCRmyPDF is a useful program (that also uses tesseract and other optimizations to add OCR to PDF, cross platform, also works under Windows and/or Windows Subsystem for Linux (WSL) - see docs)
Albireo
Posts: 1284
Joined: 16 Oct 2013, 13:53

Re: Vis2 - Image to Text OCR()

07 Sep 2020, 17:39

Thanks!
iseahound wrote:
07 Sep 2020, 11:51
... You're not really using Vis2 optimally....
How to use "Page segmentation method" together with Vis2?

What I like about Vis2 .:
- Can be controlled with commands
- Easy to choose which areas OCR should interpret.
(Did not find anything similar)

What I find more difficult is that the OCR interpretation will not be as good as I had hoped. (but maybe now?)
iseahound
Posts: 614
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - Image to Text OCR()

07 Sep 2020, 18:20

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Add this line: " --psm 8" where page segmentation mode is set to 8

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd .= " --psm 8"
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Sorry that Vis2 let you down, I've been meaning to make a large upgrade to the program, but sadly the code base is too big and unmaintainable. I've already begin splitting the different modules it relies on into their own libraries, and started removing dependency to Gdip_All.
User avatar
Pulover
Posts: 462
Joined: 29 Sep 2013, 19:51
Location: Brazil
Contact:

Re: Vis2 - Image to Text OCR()

20 Sep 2020, 10:15

This is very cool, indeed. :clap:
I'd like to integrate it into PMC.
Rodolfo U. Batista
Pulover's Macro Creator - Automation Tool (Recorder & Script Writer)
iseahound
Posts: 614
Joined: 13 Aug 2016, 21:04
GitHub: iseahound

Re: Vis2 - Image to Text OCR()

20 Sep 2020, 11:23

Sure

Return to “Scripts and Functions”

Who is online

Users browsing this forum: aifritz and 17 guests