Optical character recognition (OCR) with UWP API

malcev · Post by **malcev** » 09 Feb 2022, 17:38

if (text = "loop")
{
   n++
   if (n = 2)   ; second time
      MouseMove, X, Y, 5
}

SteveMylo · Post by **SteveMylo** » 10 Feb 2022, 01:26

@malcev Amazing Malcev! So simple in Hindsite.
I'll adapt this to many of my other scripts Thanks again.
Trying to get to work across two monitors now.
One is hBitmap := HBitmapFromScreen(0, 0, A_ScreenWidth, A_ScreenHeight)
and the other is hBitmap := HBitmapFromScreen(-3840, -480, A_ScreenWidth, A_ScreenHeight)
Although this works, you have to know which screen the text will appear on. Which doesn't suit my purposes.
I need it to look through all the screens but as you said, the search begins at 0,0 coordinates.
And yes, if it's on the 4k screen then the script has to have a pre-adjustment to the mousemove coordinates e.g. MouseMove, X-3840, Y-480, 3

Do you know of an OCR script that does this?
If not then I understand I have to throw in a few if statements. So if the script is NOT found in one screen....then search the other I guess.
cheers

malcev · Post by **malcev** » 10 Feb 2022, 04:29

I think that any ocr does not know nothing about count of monitors.
You send to ocr bitmap bits and ocr works with them and returns coordinates, from which You need decide by Yourself to what monitor they belong.

SteveMylo · Post by **SteveMylo** » 10 Feb 2022, 04:32

@malcev Yeah I feel I Just realise that.
I just got it working by pointing the ocr at two different monitors. Searching one at a time.
So there is a small delay between the second search yes.

malcev · Post by **malcev** » 10 Feb 2022, 04:37

Why dont You send 1 hbitmap which consist of this 2 monitors?

SteveMylo · Post by **SteveMylo** » 10 Feb 2022, 05:17

@malcev Well Firstly I don't know how, I'm new to OCR and the coding involved.
But I did speed up the search a lot by narrowing the range of search coordinates of each monitor.

Regarding the 'bitmap of 2 monitors', when I search the 4k monitor, the mousemove has to be MouseMove, X-3840, Y-480, 3
Whereas on the 1080 screen, the mousemove has to be MouseMove, X, Y, 3
This is why I can't just search a broad range with one single search. Cause my mousemove has to land on the target exactly.

So I'm not sure if 1 big hbitmap would work? But... you obviously are a pro at this and I'm not so I definitely could be wrong.
If you think it would then I would love to see how I could implement that.

FYI my screen size of the 4k monitor is x/y/w/h = -3840, -480, 3077, 1733
and 1080 screen is obviously 0, 0, 1920, 1080

malcev · Post by **malcev** » 10 Feb 2022, 05:30

Try to create 1 big bitmap from Your 2 monitor configuration and send it to ocr.
If some parts of it ocr should not recognize, then You can fill this parts with white color.

AlFlo · Post by **AlFlo** » 31 Mar 2022, 19:13

I'm trying to take a screencapture from within a .pdf document and then OCR the clip. I tried using Teadrinker and malcev's scripts, but can't get them working. (I'm just trying to OCR from English to English, not use multiple languages)

Joe Glines' Window Snipping tool (mentioned in this thread) works great for everything EXCEPT .pdfs. Specifically, I use Foxit PDF Editor, and for some reason his Window Snipping tool doesn't work with FoxIt.

I'm looking for something simple:

(1) Using Window 10's built in screen capture tool (i.e. win shift s reliably sends the specified area of capture to the clipboard, even in a .pdf using Foxit)

(2) Then using an AHK script to OCR the clip from my clipboard and save the OCR'd text to my clipboard.

Any help greatly appreciated!

SteveMylo · Post by **SteveMylo** » 31 Mar 2022, 19:44

@AlFloHey there, Vis2 script works amazingly well. On my PDF's too. Here is the link to the thread. viewtopic.php?f=6&t=36047&hilit=Vis2

malcev · Post by **malcev** » 01 Apr 2022, 03:00

AlFlo, what do You mean by "can't get them working"?
First script uses file path for recognition, second - screenshot.

AlFlo · Post by **AlFlo** » 01 Apr 2022, 13:42

malcev,

I got it working for a little while using Win + Ctrl + C as the hotkey. But then it stopped working again.

Specifically, when I reinstalled the script, it did what it did originally: I could get the + symbol for starting screen capture, but - no matter what I did - nothing happened until finally I got an error message about not being able to handle the Russian language (i.e. "ru").

I tried changing "ru" to "en", but still nothing worked.

What hotkeys should I be using? I see Control x in the script, but no matter what I do with the cursor, I can't highlight the area I'm trying to OCR.

AlFlo · Post by **AlFlo** » 01 Apr 2022, 13:44

SteveMylo, do I need to install additional libraries (e.g. Gdip_All, tesseract and leptonica?)

SteveMylo · Post by **SteveMylo** » 01 Apr 2022, 14:55

@AlFlo you need a couple libraries yes but can’t remember. He came out with a new version this month. So Gdip_All is replaced with something else. Follow the links & it’s somewhere there the latest update.

SteveMylo · Post by **SteveMylo** » 01 Apr 2022, 18:00

@AlFlo Here is the thread of the New Update ===> viewtopic.php?p=449254#p449254

AlFlo · Post by **AlFlo** » 01 Apr 2022, 18:41

Because - perhaps due to some idiosyncrasy of the settings of my computer - malcev's script only works about half of the time for me, it would be very helpful just to have a script which does the OCR from the built-in Windows 10 snip tool, since that tool works 100% of the time for me.

Can someone point out where in teadrinkrer/malcev's script to OCR a clip the OCR starts? Maybe I could adapt that to use the image already in the clipboard which I placed there using the Windows 10 snip tool.

AlFlo · Post by **AlFlo** » 04 Apr 2022, 10:35

In case anyone is searching for a way to OCR the image saved in a clipboard, flyingDman provided the answer here: viewtopic.php?f=76&t=102218&p=454542#p454542

Post by **neogen** » 08 Apr 2022, 16:37

Hi! Thanks for this fantastic OCR tool, extremely accurate and fast. I'm fairly new to ahk and programming in general, but as a student I am extremely curious and always looking for new methods and tool to improve so i found this incredible tool while snooping around the forum.

I was wondering if was possible to get the coordinates of the lines where text was found and recognized, since the function is able to get and count those lines. Is maybe possible to get something like ImageSearch that returns the X and Y coordinates of the upper left pixel of where the image was found on the screen?

Regards and all the best for you and thanks in advance for any information or suggestion.

AlFlo · Post by **AlFlo** » 09 Apr 2022, 12:50

Hi Neogen,

I'm a newbie myself. But I think MouseGetPos would allow you to get the x and y coordinates of the upper left pixel of where the image was found on the screen. See https://www.autohotkey.com/docs/commands/MouseGetPos.htm

Specifically, if you add MouseGetPos in your script at the point where you start capturing the image, that could work.

SteveMylo · Post by **SteveMylo** » 09 Apr 2022, 16:23

AlFlo wrote: ↑
09 Apr 2022, 12:50
Specifically, if you add MouseGetPos in your script at the point where you start capturing the image, that could work.

hi, that won’t work at all unless the mouse physically moves to the word 1st.

Post by **neogen** » 10 Apr 2022, 10:10

Hi, thanks for the reply. I don't know if i'm correct but i think that the actual OCR part happens here:

Code: Select all

BitmapFrameWithSoftwareBitmap := ComObjQuery(BitmapDecoder, IBitmapFrameWithSoftwareBitmap := "{FE287C9A-420C-4963-87AD-691436E08383}")
DllCall(NumGet(NumGet(BitmapFrameWithSoftwareBitmap+0)+6*A_PtrSize), "ptr", BitmapFrameWithSoftwareBitmap, "ptr*", SoftwareBitmap)   ; GetSoftwareBitmapAsync
WaitForAsync(SoftwareBitmap)
DllCall(NumGet(NumGet(OcrEngine+0)+6*A_PtrSize), "ptr", OcrEngine, ptr, SoftwareBitmap, "ptr*", OcrResult)   ; RecognizeAsync
WaitForAsync(OcrResult)
DllCall(NumGet(NumGet(LinesList+0)+6*A_PtrSize), "ptr", LinesList, "int", A_Index-1, "ptr*", OcrLine)

On the UWP API website it's written that "to use the OCR capabilities of the OcrEngine class in your app, call the RecognizeAsync method. When you call the RecognizeAsync method of the OcrEngine class, the method returns an OcrResult object, which contains the recognized text and its size and position. The result is split into lines, and the lines are split into words. Each OcrLine object contains a collection of OcrWord objects, accessible via Words property of each OcrLine, and each OcrWord object specifies the text, size, and position information of the word in the image"

So i know the information i seek is in there, is one of the proprieties of the first OcrWord object inside each lines. The website call this property "BoundingRect" and inside is the position and size in pixels of the recognized word from the top left corner of image when the value of TextAngle property is 0 (zero). But how can i access this information for each line? This is how far i can go, so thanks in advance for any information or suggestion.

AutoHotkey Community

Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API

Re: Optical character recognition (OCR) with UWP API