Vis2 - Image to Text OCR()

Post your working scripts, libraries and tools for AHK v1.1 and older
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

25 Apr 2018, 02:06

I am looking into it.

EDIT: If you are curious, you can see my current work on the meta branch of my GitHub repository. There is a lot of work ahead, from writing GDI/C code, ux design, and data manipulation. You can call Tesseract using Vis2.provider.Tesseract.TextRecognize().json().file("Tesseract.txt") where the base function TextRecognize() returns a string of text, but chaining the json() operator will call the JSON object which will be then saved to a file. If there are problems, it may be either an incomplete feature, or a mistake. If you think it is a bug let me know.
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

26 Apr 2018, 05:05

Some open thoughts:

In practice it seems that the best image processing routine is to go from upscaling -> binarization -> Tesseract OCR which is the current routine that is being used now. However, as powerful as leptonica can be, its upscaling leaves much to be desired. nnedi3 and waifu2x are very good, with xBR (scale by rules) being decent as well. nnedi3 was developed as an image deinterlacer, which on old televisions, looks something like this. Because a deinterlaced frame is only half the image, nnedi3 uses a neural network (with pre-computed weights, important for later) to interpolate the missing pixels in an image. Because of this feature, nnedi3 also works extremely well as an image doubler. Waifu2x was developed later using neural networks and has a pleasant "trace" or signature that many image interpolation algorithms leave behind. However, it does not use hard coded weights and cannot run in real-time, where as nnedi3 can. Still Lanczos gives pretty good results, in my opinion, but since the source image will be upscaled over 2x, using an image doubler will improve the quality dramatically.

Image binarization is very good. Leptonica uses Sauvola, Wolf, et al and is arguably the best binarization algorithm out there. There is a possibility that leptonica preforms binarization first, then upscaling, which gives inferior results.

After the binarization step, a clean up step can possibly be introduced. This might be as simple as denoising small particles, or using MSER (Maximally Stable External Regions) to detect text regions. The question is whether such a complex step would be better as the first step, SIFT/MSER -> crop -> upscaling -> binarization -> denoise -> ocr. Regardless it might require OpenCV, or I'd have to compile the code myself.

What's especially interesting is how these algorithms were developed. xBR is a pixel shader for people who love to emulate old PS1 games. Waifu2x is used to increase the low quality images of "waifus" or 2D girls who are attractive to them. Tesseract is an open source project, now led by google, but google's vision API is far superior to whatever can come out of tesseract. Still it's possible that they are using tesseract in some form. leptonica is a library written in C for image processing studies. Lanczos is a sinc() function, or sin(x)/x upscaler. It's better than BiCubic and Catmull Rom. Most of these algorithms are implemented on madVR, a video renderer which can be found on the doom9 forums. The doom9 forums are full of videophiles who love video quality, and there's lots of really good projects on there. SIFT, MSER, are modern versions of DoG (Difference of Gaussian) which itself is an approximation of LoG (Lapacian of Gaussian), and Harris corners. These algorithms can be used to detect "hotspots" or interesting parts of an image. These points can then be tracked in real-time for example like on Microsoft's Hololens mixed reality headset. SIFT = Scale Invariant Feature Transform. Still it's quite ironic that many of these techniques were simply developed by people who love watching anime, people who love their old video games, and people interested in math. Unless these ideas are mathematically pure, in the sense that at some point or another someone will stumble upon the mathematics, very few of these techniques were developed with a sense of purpose, other than I want to enjoy my video games/movies/anime more.

EDIT: possible deskewer? Also I'll probably take a break from this. It's a pretty elaborate pet project, but still a pet despite it being warm and fuzzy.
robodesign
Posts: 934
Joined: 30 Sep 2017, 03:59
Location: Romania
Contact:

Re: Vis2 - OCR(), ImageIdentify()

27 Apr 2018, 16:35

Thanks for looking into these things.

Since I discovered this project I've been thinking about how it could be transformed into something I could use daily and easily to read small parts of the screen when I need, "on demand". I dislike that I often need to zoom in (the entire screen) for just a few words I can't read otherwise.

I looked even into the code of Capture2Text, to see how it detects the text that needs to be captured underneath the mouse cursor. As far as I was able to understand, it searches for nearby pixels underneath the cursor and enlarges the area of the rectangle until each of its edges is of equal (or similar) pixels (color-wise). I'm not sure I understood the code well, so apologies if I am wrong.

However, it seems like a relatively simple solution to identify the bounding box / the text boundaries which is underneath the mouse cursor.

Of course, the solution is feasible only for simple, one colored backgrounds. But given most UIs are like this, I'd take this route....

PS. For now, I did not get my hands into the code of Vis2, because it's beyond my level of expertise in coding.... (KeyPress is the first thing I coded.... ). I might try to do so, but gradually. However, if you will implement this feature I need, I will probably not...

Best regards, Marius.
-------------------------
KeyPress OSD v4: GitHub or forum. (presentation video)
Quick Picto Viewer: GitHub or forum.
AHK GDI+ expanded / compilation library (on GitHub)
My home page.
drizzt
Posts: 31
Joined: 20 Apr 2018, 20:44

Re: Vis2 - OCR(), ImageIdentify()

29 Apr 2018, 13:29

I am trying to use an OCR like this to determine the screen positions of certain objects. for example how could I determine the screen position of the text "Rack-512.30.07" from the image below
https://imgur.com/3Achjb5
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

29 Apr 2018, 18:27

Performance might be slow or inaccurate. You could try the meta branch at https://github.com/iseahound/Vis2/tree/meta and run Vis2.provider.Tesseract.TextRecognize("window name").

Code: Select all

data := Vis2.provider.Tesseract.TextRecognize("Rack Details") ; Check if Rack Details is the correct window name. 
FullData := data.FullData ; returns an array of objects. 
MsgBox % FullData[1].category ; This shows the text phrase of the first block. We need to iterate over the blocks/categories using a for-loop. 

for i, value in FullData {
    if (value.category ~= "i)512\.30\.07") ; you could also use FullData[i].category instead of value.category. 
        polygon := value.polygon ; Polygon Object. 
}

; A polygon object is an collection of 4 points. Imagine a rectangle with the labels A, B, C, D. Each letter has an x and a y value. 
MsgBox % polygon[1].x ", " polygon[1].y
MsgBox % polygon[2].x ", " polygon[2].y
MsgBox % polygon[3].x ", " polygon[3].y
MsgBox % polygon[4].x ", " polygon[4].y
This likely works. But the meta branch is unstable as it contains advanced and new features.

I also recommend trying to debug using the JSON output. Vis2.provider.Tesseract.TextRecognize("Rack Details").json().clipboard(). This will copy the JSON to the clopboard. Paste it here: https://codebeautify.org/jsonviewer CLick on tree viewer to see all the data that can be extracted from data.FullData (Hint: there's a lot!)

If you are confused as to what the blocks and categories are: Try running Vis2.provider.Tesseract,TextRecognize() with an empty input. This will bring you into the GUI mode, where the red rectangles will show you the points. In the red rectangles are too big, they can be broken down into paragraphs, lines, and words. Again you should try looking at the JSON, it is identical to the FullData object.
Last edited by iseahound on 29 Apr 2018, 18:47, edited 2 times in total.
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

29 Apr 2018, 18:46

So I did some further testing on your image, and it seems that performance would be very inaccurate. Given that the OCR can only copy the text that it sees, and you have a scrolling treeview, I'm not sure that this would be suitable for your task. Is there any reason you cannot extract the data directly from your application? OCR is inaccurate and frustrating to use, and I personally only use it to copy and paste text from certain websites that disable right clicking and control + c.
drizzt
Posts: 31
Joined: 20 Apr 2018, 20:44

Re: Vis2 - OCR(), ImageIdentify()

30 Apr 2018, 18:26

Thanks for taking the time to investigate and reply.
Originally I was trying to access the tree directly, but could not find any useful info. All the discussions I found on AHK forums regarding the topic of getting and working with the control of an external listview points to broken links and are outdated.
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

30 Apr 2018, 19:19

You shouldn't be extracting information from a GUI in the first place. The GUI is getting its information via some sort of command. Find the command, then intercept it.
bwasserman1
Posts: 1
Joined: 11 May 2018, 08:15

Re: Vis2 - OCR(), ImageIdentify()

11 May 2018, 08:34

I am new to using this ocr program and have a simple question. I have searched the forum and cannot find any information that tells me how I cna get the location of an image found on the searhced page. I.e. I am looking for a text box, which I can find, and after the box is found I want to have the mouse go to it so I can insert some text.

I do not see how how the VIs2 ocr function can pass me these coordinates?

I would appreciate any help.

Thank you
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

11 May 2018, 17:08

You'll have to use the meta branch at https://github.com/iseahound/Vis2/tree/meta.

Run one of these two commands:

Code: Select all

data1 := Vis2.provider.Tesseract.TextRecognize().FullData ; FullData is an object!
data2 := Vis2.provider.Google.TextRecognize().FullData
Then you'll have to parse through the data. Example: data1[1].category will return the text in the first text block. data1[1].score will return the confidence value. data1[1].polygon will return an array of vertices. Access each individual value like: data1[1].polygon[1].x and data1[1].polygon[1].y

Here is a for loop example (untested)

Code: Select all

for i in data1 {
    MsgBox % data1[i].category
    MsgBox % data1[i].score
    for j in data1[i].polygon {
        MsgBox % data1[i].polygon[j].x ", " data1[i].polygon[j].y
    }
}
A word of warning, OCR is much more inaccurate than you think. Be prepared to write failure code to capture the possibility it might fail.
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

14 May 2018, 18:29

I am getting the error - could not find source image. I believe I followed the setup instructions properly. What am I missing? Thanks for any help!
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

14 May 2018, 19:15

Did you download the full package from GitHub? https://github.com/iseahound/Vis2/archive/master.zip I don't know what code you are running, so I can't give any detailed help.
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

14 May 2018, 21:50

iseahound wrote:Did you download the full package from GitHub? https://github.com/iseahound/Vis2/archive/master.zip I don't know what code you are running, so I can't give any detailed help.
iseahound yes it is the latest 4/4 code. Thank you for the response!

It seems like when I run the code with variables - OCR([x,y,wx,wy]) or OCR([%x%,%y%,%wx%,%wy%]) it fails, but when I type in numbers - OCR([10,10,10,10]) it works

Specific error is:
Exception thrown!
what: Vis2.stdlib.toFile
file:
C:\Users\<me>\Dropbox\Rebuilding\Vis2-master\lib\Vis2.ahk
line: 2566
message: could not find source image
extra:
Last edited by renmacro on 14 May 2018, 22:01, edited 1 time in total.
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

14 May 2018, 22:01

Code: Select all

#include <Vis2>  ; Equivalent to #include .\lib\Vis2.ahk
x := 100
y := 100
w := 500
h := 500
MsgBox % OCR([x, y, w, h])
This works fine for me
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

14 May 2018, 22:06

iseahound wrote:

Code: Select all

#include <Vis2>  ; Equivalent to #include .\lib\Vis2.ahk
x := 100
y := 100
w := 500
h := 500
MsgBox % OCR([x, y, w, h])
This works fine for me
It must not like my variable contents. Just setting variables to numbers like you did worked great. I must have null variables or something else wrong. Thank you so much! I'll check out what may be wrong!

EDIT: Confirmed, I had null variables, thank you again!
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

14 May 2018, 22:13

Make sure your variable isn't a string. If it is try MsgBox % OCR([x+0, y+0, w+0, h+0])
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

14 May 2018, 22:16

iseahound wrote:Make sure your variable isn't a string. If it is try MsgBox % OCR([x+0, y+0, w+0, h+0])
Oh very good! I was always kind of curious about how AHK handled variable types. I'll keep an eye out for that!
renmacro
Posts: 22
Joined: 05 Mar 2018, 23:30

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 12:26

In my application tesseract is confusing 0's with 8's at random times, since there is the cross line on the zero. Since I am using it in code and not the GUI, I am assuming it uses tessdata_best already as listed in OP?

Is the only way to fix this to further train tesseract with a specific image differentiating the 0 and 8? My use requires just a single specific font.
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 15:24

Training is possible, but I haven't included the necessary command line utilities. You're right in that training may help, but unless you have some wacky, specific font, training is unlikely to beat the pre-trained model that is tesseract_best. Since your 0 just has a line through it which is pretty common for a zero, it's probably the upscaling step that's ruining it. I suspect it's 0 or 8 about half the time? The image is upscaled 3.5x before processing, maybe changing it to 2 or 2.5 may help. I think you can control F for 3.5. Finally there's a lot of improvements that can be made. I intend to improve it sometime in the future.
iseahound
Posts: 1434
Joined: 13 Aug 2016, 21:04
Contact:

Re: Vis2 - OCR(), ImageIdentify()

15 May 2018, 15:26

For a more exact FindText on screen you should try feiyue's project.

Return to “Scripts and Functions (v1)”

Who is online

Users browsing this forum: gwarble and 133 guests