Vis2 - Image to Text OCR()

Post your working scripts, libraries and tools for AHK v1.1 and older
Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Vis2 - Image to Text OCR()

Post by Albireo » 03 Sep 2020, 12:37

Have tried to use different files in the directory C:\Temp\OCR_Tesseract\bin\tesseract\tessdata_best
The tessdata files I tested, has the following information .:
tessdata_fast (files)
  • swe.traineddata 13.308kB
  • eng.traineddata 4.017kB
  • Runtime (one row) .: 3.531000 sek.
  • Total Runtime (fifteen rows) .: 51.218000 sek.

tessdata_best (files)
  • swe.traineddata 13.990kB
    eng.traineddata 15.040kB
  • Runtime (one row) .: 3.609000 sek.
  • Total Runtime (fifteen rows) .: 48.016000 sek.

Not much difference in speed ("best" was a bit faster overall)
iseahound wrote:
02 Sep 2020, 12:26
… you may have to change the x,y,w,h values by a pixel or two - that seems to have a large effect on the final outcome...
You were right, suddenly the first lines started working without problems. (y=749 Instead y=750 - 150dpi)
That I chose the right language (swe) was not a disadvantage either (ÅÄÖ began to be interpreted correctly)
Have also tested using several traineddata, like this .: OCR(imgFile, "swe+eng", [x, y, w, h]) (did not get worse)
but, then…

Right now I am trying to get the AHK-program to interpret rows 5,6,7 correctly - but fail (the rows after become correct again)
Thought it would be better with an image of 600 dpi, and to some extent it turned out so. But not on these signs.
F. becomes E and 8 becomes 3 and...
These characters are in the middle of the line - everything is correct before and at the end of the line (but not these two...)

Is there any way to see what is in the selected area (x, y, w, h) ?
The document is written with the font Courier (Courier is monospaced ABC KLM ÅÄÖ - Does not appear correctly here)
How is this font selected in the OCR interpretation? (Is it possible?)

iseahound
Posts: 1427
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - Image to Text OCR()

Post by iseahound » 03 Sep 2020, 13:18

Yep. The problems you've bought up are the same ones that I have. Unfortunately, that is the nature of OCR, it is inexact. However, I do believe that there are solutions for this problem, and that requires a substantial rewrite of Vis2.

Vis2 hasn't been properly updated in years, and I intend to keep it that way. I've been focusing hard on building the backbone of a future iteration of some kind of OCR program in AHK, starting with https://github.com/iseahound/ImagePut which provides a graphics translation library.

Likewise there is also AutoHotkey v2 to consider. I like what's going on with the language a lot. Therefore some future program will 100% be written in v2 and backported to v1.

Finally, I have real life to attend to!

On line 2081 try adding the following line below:

Code: Select all

               this.preprocess(screenshot, this.fileProcessedImage)
               Vis2.Graphics.Image.Render(this.fileProcessedImage, 1, 3000) ; new line that shows what is being ocr'd for 3 seconds! If the image is too big, change 1 to 0.5.

Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Vis2 - Image to Text OCR()

Post by Albireo » 03 Sep 2020, 17:28

Great! ( I like it - although the images do not really fit my monitor ;) )

It's not possible to select the font with which the document is written to increase the quality?
What resolution should the image be created in, for best results? (150dpi / 300dpi / 600dpi / 1200dpi / or ...)
Does an image in black and white or grayscale work best to OCR-interpret?

What happens if the image is a scanned invoice? (slightly skewed)

Do not really know how to adjust the row heights. at 1200dpi resolution, only a few pixels distinguish between the "," (comma) line above and the "Å" line below. More space around what should be OCR-interpreted seems to give better results

iseahound
Posts: 1427
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - Image to Text OCR()

Post by iseahound » 03 Sep 2020, 17:56

You're not really asking OCR questions. These are how to scan document type questions.

Always: for photos scan at 300 ppi. (We don't use dots per inch anymore, it's pixels per inch.)
Grayscale documents: 600 ppi.

You should scan documents in greyscale. There is an algorithm that automatically converts greyscale to black and white depending on how the brightness and contrast is distributed around the image. Note that the document is upscaled first, then changed into black and white.

If you're scanning an invoice you should do your best to right the image. Skewed images have horrible performance.

No you cannot select the font. There are OCR optimized fonts however.

You can run OCR 3 times, with different row heights, then if there is 2/3 or 3/3 that agree, you take that number.

Those are the only suggestions I can give you. Also you can change the scale to 0.25 or lower depending on the size of your document.

Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Vis2 - Image to Text OCR()

Post by Albireo » 07 Sep 2020, 05:16

Thanks!
Usually the original is file in PDF-format.
That PDF-file is converted to a grayscale image (any resolution) with pdftopng
Have mostly tested the resolutions 150/300/600 ppi (I do not know the difference between dpi / ppi)
Have not experienced any difference between the resolutions, other than that it is easier to distinguish between the rows (more pixels)

When I just select the characters "F." the program can not interpret - (no result)
When even the character before is selected ("DF."). The OCR result is "DE"
Has managed to get the program to interpret "F." in some cases,
but in that case the number 8 has been interpreted as 3 or
the end of the line has not been interpreted at all or
the result has been generated on several lines.

I have no idea why the result is so random or how the result could be improved. (right now it's useless for me)

Have tested many other programs to convert my PDF files to text. These translate the rows perfectly, but the structure makes it more or less impossible to separate the different columns and lacks the ability to select only a certain field to be interpreted

iseahound
Posts: 1427
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - Image to Text OCR()

Post by iseahound » 07 Sep 2020, 11:51

Okay. You're not really using Vis2 optimally. Vis2 is designed to be a user friendly way of using OCR via a graphical user interface. You're looking for page segmentation modes in Tesseract:

Code: Select all

Page segmentation method
By default Tesseract expects a page of text when it segments an image. If you're just seeking to OCR a small region, try a different segmentation mode, using the --psm argument. Note that adding a white border to text which is too tightly cropped may also help, see issue 398.

To see a complete list of supported page segmentation modes, use tesseract -h. Here's the list as of 3.21:

  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
			bypassing hacks that are Tesseract-specific.
8 Treat the image as a single word.
10 Treat the image as a single character.
https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md
If you know you will only encounter a subset of the characters available in the language, such as only digits, you can use the tessedit_char_whitelist configuration variable. See the FAQ for an example.

ahk7
Posts: 572
Joined: 06 Nov 2013, 16:35

Re: Vis2 - Image to Text OCR()

Post by ahk7 » 07 Sep 2020, 12:30

Afaik tesseract 4 should be better, https://tesseract-ocr.github.io/tessdoc/ReleaseNotes#tesseract-release-notes-oct-29-2018---v400 (current version is V4.1.1)
Vis2 probably uses version 3.x?

Just for reference, https://github.com/jbarlow83/OCRmyPDF is a useful program (that also uses tesseract and other optimizations to add OCR to PDF, cross platform, also works under Windows and/or Windows Subsystem for Linux (WSL) - see docs)

Albireo
Posts: 1743
Joined: 16 Oct 2013, 13:53

Re: Vis2 - Image to Text OCR()

Post by Albireo » 07 Sep 2020, 17:39

Thanks!
iseahound wrote:
07 Sep 2020, 11:51
... You're not really using Vis2 optimally....
How to use "Page segmentation method" together with Vis2?

What I like about Vis2 .:
- Can be controlled with commands
- Easy to choose which areas OCR should interpret.
(Did not find anything similar)

What I find more difficult is that the OCR interpretation will not be as good as I had hoped. (but maybe now?)

iseahound
Posts: 1427
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - Image to Text OCR()

Post by iseahound » 07 Sep 2020, 18:20

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Add this line: " --psm 8" where page segmentation mode is set to 8

Code: Select all

            static q := Chr(0x22)
            _cmd .= q this.tesseract q " --tessdata-dir " q fast q " " q in q " " q SubStr(out, 1, -4) q
            _cmd .= (this.language) ? " -l " q this.language q : ""
            _cmd .= " --psm 8"
            _cmd := ComSpec " /C " q _cmd q
            RunWait % _cmd,, Hide
Sorry that Vis2 let you down, I've been meaning to make a large upgrade to the program, but sadly the code base is too big and unmaintainable. I've already begin splitting the different modules it relies on into their own libraries, and started removing dependency to Gdip_All.

User avatar
Pulover
Posts: 612
Joined: 29 Sep 2013, 19:51
Location: Brazil
Contact:

Re: Vis2 - Image to Text OCR()

Post by Pulover » 20 Sep 2020, 10:15

This is very cool, indeed. :clap:
I'd like to integrate it into PMC.
Rodolfo U. Batista
Pulover's Macro Creator - Automation Tool (Recorder & Script Writer)

iseahound
Posts: 1427
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - Image to Text OCR()

Post by iseahound » 20 Sep 2020, 11:23

Sure

phucnguyenphi123
Posts: 18
Joined: 28 Jul 2020, 03:58

Re: Vis2 - Image to Text OCR()

Post by phucnguyenphi123 » 01 Nov 2020, 21:22

@iseahound
hi iseahound, after i installed your zip file and did follow your instructions, i got this error. what should i do to fix this? Thanks.
Attachments
Screenshot (24).png
Screenshot (24).png (154.57 KiB) Viewed 4911 times

phucnguyenphi123
Posts: 18
Joined: 28 Jul 2020, 03:58

Re: Vis2 - Image to Text OCR()

Post by phucnguyenphi123 » 01 Nov 2020, 21:39

After i ran demo code, i got this error.
Attachments
Screenshot (25).png
Screenshot (25).png (184.77 KiB) Viewed 4907 times

karlotto
Posts: 5
Joined: 09 Aug 2020, 13:43

Re: Vis2 - Image to Text OCR()

Post by karlotto » 08 Nov 2020, 04:34

Hi guys,

I'm very new to ahk (some weeks playing with it). I need to do some super simple ocr and a user pointed me to wonderful vis 2.

I'm trying to use the syntax:

Code: Select all

text := OCR([1, 2, 3, 4])
if I'm correct 1 and 2 are the first vertex of a rectangle area while 3 4 are not the cord of the opposite vertex but widht and high of the rectangle.

my question here is: I want to check a rectangle inside a specific window. To make that like for pixelsearch and image search i tried to use winactivate
so i tried something like:

Code: Select all

WinActivate, MYWINDOW_NAME ahk_class Qt5QWindowIcon
sleep, 300
CoordMode, Pixel, Window
text := OCR([x, y, W, H])
but still ocr seems to work in an screen cord system..

So, how to use OCR to check a rectangle inside a specific windows?
I bet I made a dumb question but I'm unable to go on from this point. Many thanks for your help and time

EDIT: I see it was already discusse some pages ago. i'll try to solve wjth the new infos and post my solution

Proentitus
Posts: 1
Joined: 02 Feb 2021, 10:18

Re: Vis2 - Image to Text OCR()

Post by Proentitus » 02 Feb 2021, 10:54

I'm new to AHK and I'm very excited to use Vis2 in the simple scripts I'm writing. My goal is to take a number from the active window and assign it to a variable (putting it in the clipboard first is fine). The number is, of course, an image and not selectable, hence the need for this OCR.

However, I'm having trouble calling the OCR([x,y,w,h]) function from within a script. I've tried it several different ways. I read through this entire Topic and found several suggestions and tried them all to no avail.

Here are all the different ways I've tried with a comment on each line for what it does.

Code: Select all

#include <Vis2>

^#z:: OCR() ;This activates the OCR tool and allows manual click-n-drag, telling me that the files are setup correctly.
return
^#g:: 
	OCR() ;Activates OCR as expected, but doesn't process further commands until something has been manually click-n-dragged.
	MouseClickDrag, L, 555, 840, 644, 855, 50 ;This is run after the OCR function has been manually completed.
return
^#x:: OCR([555, 855, 99, 15]) ;Does nothing.
return
^#c:: text := OCR([555, 855, 99, 15]) ;Does nothing.
return
^#v:: clipboard := OCR([555, 855, 99, 15]) ;Does nothing, other than clear the clipboard.
return
^#b:: OCR([555, 855, 99, 15]).clipboard() ;Does nothing, other than clear the clipboard.
return
^#a:: OCR("A", , [555, 855, 99, 15]) ;Does nothing.
return
^#s:: text := OCR("A", , [555, 855, 99, 15]) ;Does nothing.
return
^#d:: clipboard := OCR("A", , [555, 855, 99, 15]) ;Does nothing, other than clear the clipboard.
return
^#f:: OCR("A", , [555, 855, 99, 15]).clipboard() ;Does nothing, other than clear the clipboard.
return
Could someone tell me which of those should work and what I'm doing wrong to make them not work? In the Vis2.ahk file, it shows Recent: 2018-04-04 at the top, which I guess indicates the version?

need4speed
Posts: 143
Joined: 22 Apr 2016, 06:50

Re: Vis2 - Image to Text OCR()

Post by need4speed » 14 Feb 2021, 20:24

Wow, I was looking for a replacement for Capture2Text and I found this pearl. thanks
I noticed a minor issue where the OCR works on my second monitor, but the selection rectangle is not shown.

ArrowMaster
Posts: 1
Joined: 27 Feb 2021, 19:27

Re: Vis2 - Image to Text OCR()

Post by ArrowMaster » 27 Feb 2021, 19:32

please help guys, I don't know what to do! I downloaded the zip file, transferred the lib folder to my lib folder which is under Documents and then when I run the test that I'm asked to run (making a new AHK script in the folder that has vis2.ahk) I get this error message that I don't know what to make of...

Exception thrown!
what: Vis2.provider.Tesseract.preprocess
Tile:
C:\Users\rs-ay\OneDrive\Documents\AutoHotkey\Lib\Wis2.ahk
line: 2179
message: Leptonica not found
extra: C:\Users\rs-ay\OneDrive\Bureau\New
folder\Vis2-master\lib\bin\leptonica_util\leptonica_util.exe

I know that leptonica is in the bin folder, I just don't know what to do with it! HELP

densch
Posts: 120
Joined: 29 May 2018, 15:10

Re: Vis2 - Image to Text OCR()

Post by densch » 04 Jul 2021, 05:09

Hello,
how fast is the ocr?

cause I would want to use ocr on a small area on a broiwser window in the background (so would have to get it via window id AND use ocr on a small given area there).
and the string result would directly be used in my script, it ould get assigned to a variable.

and I would do this about every second or so.
could this ocr script do this and keep up with a speed of 1 ocr scan per second or so?
or does it take much longer?

freddieventura
Posts: 7
Joined: 07 Jul 2021, 07:24
Contact:

Re: Vis2 - Image to Text OCR()

Post by freddieventura » 20 Jul 2021, 11:30

Hi there guys,

First of all thanks for this amazing script , it looks so nice . I have tried the demo and its works but unfortunatelly when I'm trying the functions in my script , it is not working.

Code: Select all

Exception thrown!

what: Vis2.stdlib.toFile
file: C:\dotfiles-win\ahk-scripts\Lib\Vis2.ahk
line: 2566
message: Could not find source image.
extra:
All I have done is download the .zip file and unpack it in my ahk path C:\dotfiles-win\ahk-scripts\
So using the demo it works great but when using this little piece of code like

Code: Select all

#NoEnv
#SingleInstance,Force  
;#InstallKeybdHook
#include <FindText>
#include <Vis2>

Appskey & l::
    MsgBox % OCR("Untitled - Notepad")
return
And it gives me that error . It is weird, may it be something about permissions or something like that?

Thank you

rawkus123
Posts: 2
Joined: 28 Nov 2021, 08:19

Re: Vis2 - Image to Text OCR()

Post by rawkus123 » 28 Nov 2021, 08:24

How do I change psm options when using OCR screen coordinates? Currently I'm trying to read a series of numbers but the number 8 in the sequence is coming out as a 3?

Post Reply

Return to “Scripts and Functions (v1)”