Easy OCR

FanaticGuru · Post by **FanaticGuru** » 14 Jul 2023, 16:30

Krd wrote: ↑
13 Jul 2023, 02:44
Since I already included Easy OCR into my scripts, it would be more preferable to utilize it directly instead of using Snipper, which I currently use to run when needed. But I will ask FG anyway.

If you are just wanting to drag a rectangle and then OCR on that area, it is really not that hard.

Here is a simple SelectScreeenRegion function that feeds the area selected into OCR.

Code: Select all

#Requires AutoHotkey v2

; Select Screen Region with Mouse
^#LButton:: ; Control+Win+Left Mouse to Select
{
	Area := SelectScreenRegion("LButton")
	Result := OCR.FromRect(Area.X, Area.Y, Area.W, Area.H)
	MsgBox(Result.Text)
}

Esc:: ExitApp

SelectScreenRegion(Key, Color := "Lime", Transparent:= 80)
{
	CoordMode("Mouse", "Screen")
	MouseGetPos(&sX, &sY)
	ssrGui := Gui("+AlwaysOnTop -caption +Border +ToolWindow +LastFound -DPIScale")
	WinSetTransparent(Transparent)
	ssrGui.BackColor := Color
	Loop 
	{
		Sleep 10
		MouseGetPos(&eX, &eY)
		W := Abs(sX - eX), H := Abs(sY - eY)
		X := Min(sX, eX), Y := Min(sY, eY)
		ssrGui.Show("x" X " y" Y " w" W " h" H)
	} Until !GetKeyState(Key, "p")
	ssrGui.Destroy()
	Return { X: X, Y: Y, W: W, H: H, X2: X + W, Y2: Y + H }
}

FG

Krd · Post by **Krd** » 15 Jul 2023, 12:29

That is 100 times better than Vis2; this is awesomely fast! Now there is really not much I miss about v1, if anything.

However, OCR needs some improvements for sources other than webpages. In certain Windows systems with low quality and smaller text size, it struggles a bit. For example, it misreads 0 as 8, etc
I don't know if that is just me.

FG I then deleted my request in your thread as you solved it here.

Many thanks to you and DL!

Much appreciated.

Descolada · Post by **Descolada** » 15 Jul 2023, 13:00

@Krd, you might get better accuracy by increasing the scale to for example 1.5 (150%) or 2 (200%), though this will make the OCR slower. Eg OCR.FromDesktop(, 2). Other than that, since this is using the Windows built-in OCR engine there isn't much more to improve upon (can't modify the engine itself)...

Krd · Post by **Krd** » 15 Jul 2023, 13:09

Thanks for clarifying for the pro.

I get it. Will try how that works.

FanaticGuru · Post by **FanaticGuru** » 15 Jul 2023, 16:35

Descolada wrote: ↑
15 Jul 2023, 13:00
@Krd, you might get better accuracy by increasing the scale to for example 1.5 (150%) or 2 (200%), though this will make the OCR slower. Eg OCR.FromDesktop(, 2). Other than that, since this is using the Windows built-in OCR engine there isn't much more to improve upon (can't modify the engine itself)...

With other OCR applications, I have seen people preprocess the image and convert to strictly black and white to get better results before passing to OCR.

Not sure how much it might improve Windows results.

FG

malcev · Post by **malcev** » 15 Jul 2023, 17:29

Krd, You can try to use tesseract and preprocess image with black&white + sharpen.
I used it for 1 project and results were quite good.
You can start from here:
viewtopic.php?f=76&t=93120

Descolada · Post by **Descolada** » 16 Jul 2023, 00:43

@FanaticGuru, you are correct that in some cases pre- and postprocessing might improve results. But this, at least in my practice, isn't generalizable to all applications of OCR. For example, if you are dealing with white text on black then you might get a lot better performance from the OCR if you reverse the colors of the image, but this might break OCR in all other cases.

If you want maximum image recognition quality you could improve all the parts in the OCR process: preprocessing, processing, and postprocessing. Which of these will work or give best results depends on many factors such as quality of the image, amount of noise, text language etc.

Preprocessing: processing the image for best quality before sending it in the OCR engine
1) Use better quality input images. If reading the input image from a file (jpg, png), then if you can increase the original resolution and decrease compression then the OCR engine will have more information to work with and will give better results. This is not applicable to FromDesktop, FromWindow etc.
2) Converting to black-and-white as @FanaticGuru mentioned is a good technique to use, and is built-in in many OCR preprocessors. I've used it in previous projects to good results, but those were cases where the image was supposed to be black-and-white in the first place. In such cases this works as a denoiser and might improve recognition accuracy. Whereas if you have a color image (eg pink text on yellow background) this might again totally break the recognition (e.g. converting both background and text to white...). Whether this is already integrated in UWP I don't know.
3) Deskewing - rotating the image to be straight. I believe UWP OCR does this internally since it returns the TextAngle property, but using a custom algorithm to do this might again give better results.
4) Sharpening, as mentioned by @malcev, is another technique to use. You could increase the contrast, use algorithms to remove salt-and-pepper noise, etc.
5) Line removal: removing straight lines from the input will reduce overall noise and should improve recognition accuracy.

OCR engine selection is also important. Different OCR engines differ in the type and size of the neural network, size of the training data, etc. Tesseract is one of the better ones to use, but UWP OCR is built-in Windows so it's easier to use and will make your script more compact size-wise (don't have to download a separate engine).
Also, usually there are different models for different languages, so if you are not using English you should use an appropriate one.

Postprocessing the text is the final step. This would mean correcting mistakes in the outputted text by using word banks to remove typos (eg "imege" -> "image"), using n-grams or Markov chains to predict next words, using Levenshtein distance etc. There are many methods to choose from which you can find by Googling "OCR postprocessing methods".

I would say that most of these techniques are rather advanced, and I probably wouldn't use AHK to achieve these, mostly because the amount of available libraries for image and text processing is still small, and image processing might require a lot of computation which AHK isn't suitable for. Python or C++ might be better options for this.

nt-_-ts · Post by **nt-_-ts** » 17 Jul 2023, 07:48

Congrats for this one as well Descolada, I'm going to implement it into my project together with the amazing UIA library.

hasantr · Post by **hasantr** » 20 Jul 2023, 01:51

this looks pretty good. AHK is moving into the future with these great libraries.

Post by **wind_up_birb** » 20 Jul 2023, 08:28

I am new here and new to AHK so please excuse me if I am not asking the right questions:

Could something like this be used to convert units on screen?

Say I was looking through documentation and all measurements were Imperial, but I would like to see them in Metric, could I create a script that could OCR all on screen numbers and convert them to different units?

I work in Construction estimating and planning and I have wanted to create something like that for a long time, but can never figure out where to start. For my own use it wouldn't even have to be unit specific, just mass convert a number to every usual suspect then I can pick the one I want.

Descolada · Post by **Descolada** » 20 Jul 2023, 13:38

@wind_up_birb, if there isn't any better way to recognize the numbers/units on the screen, then I guess it should be possible (though I'm not sure how reliable - only trying it out will show that). I added a method FindStrings to make this possible, see Example6 for how it works. For example you could use captured := OCR.FromDesktop(,2) to capture all content from the screen, then results := captured.FindStrings("\d+ inches",,RegExMatch) to find all numbers that are followed by the word "inches". Now if you looped over the results then you could convert the units.
All-in-all something like this:

Code: Select all

captured := OCR.FromDesktop(,2)
results := captured.FindStrings("\d+ inches",,RegExMatch)
for result in results {
    MsgBox result.Words[1].Text ; convert this to cm
}

arcylix · Post by **arcylix** » 25 Jul 2023, 14:09

@Descolada
Because I'm still relatively new to AHK, could you explain to me what the "A" represents in your examples?

Code: Select all

OCR.FromWindow.Bind("A"))

This line particularly is what I am trying to figure out. Is it just a placeholder, or is there some significance otherwise? Right now, I'm using the OCR library for certain tasks such as clicking specific locations, and it's working, but I want to make sure I avoid issues down the road.

Thank you, by the way, for this glorious tool.

Descolada · Post by **Descolada** » 25 Jul 2023, 14:30

@arcylix, take for example OCR.WaitText, which takes a function that does OCR and a needle word to search from the result returned by the OCR. That OCR function should be called multiple times inside WaitText until the needle is found, so you need to provide not the result of the OCR function, but instead a function that when called upon returns an OCR result. For example OCR.WaitText("test",, OCR.FromDesktop) will call OCR.FromDesktop() until the word "test" is found, whereas OCR.WaitText("test",, OCR.FromDesktop()) would already call OCR.FromDesktop() (because of the ()) so there would be no point in waiting for something that doesn't change (and in fact since OCR.FromDesktop() doesn't return a function, WaitText will throw an error). So OCR.Desktop is the function to be called, whereas OCR.Desktop() is the result of the function.
Now, what if we wanted to use a function that takes some arguments? OCR.FromWindow first argument is WinTitle, so what if we want to wait for a certain window to contain the word "test"? We can do this by binding an argument to OCR.FromWindow with OCR.FromWindow.Bind(WinTitle) which returns a new function which when called will do OCR on that specific function. In the case of OCR.FromWindow.Bind("A") the WinTitle is "A", which in AutoHotkey means "active window". Thus OCR.WaitText("test",, OCR.FromWindow.Bind("A")) will wait for the word "test" inside the active window.

arcylix · Post by **arcylix** » 26 Jul 2023, 08:57

@Descolada

Thank you for the explanation on this, and this makes absolute sense.

One other question. If I simply need to wait for text to appear on the screen without needing to click on it, how would I accomplish that? In my head, I'm envisioning it needing to loop until it finds the result, but I am not sure how to structure it.

The use-case for the above request is that I will populate information fields in a form for work, then click 'Search'. The transition from the first page to the next page takes an variable amount of time, but once a keyword is found, it should move to the next step. Thank you!

Descolada · Post by **Descolada** » 26 Jul 2023, 10:06

@arcylix, OCR.WaitText does exactly that - it waits for specified text to appear. You don't have to use the result to click anything.

MinorKey · Post by **MinorKey** » 27 Aug 2023, 04:28

@Descolada, thanks a million. Your script is everything I needed and much much more.

Post by **Avasilev** » 08 Sep 2023, 22:28

@Descolada Great Job! This simple but very functional OCR is really cool. I recently started programming with AHK1 and now slowly adapting to V2.

Got some questions for working under my project:

The Idea is to monitor screen in specific window like OCR.FromWindow but also in spesific region (x, y, xx, yy) I want the script ("keep finding") words and highlightning them (all found words) at the same time with different colors, and once the word dissapears or moves - the highlight also removes, I assume the created Gui for each found word should be killed only if the word has dissapered?

I tried to use highlight function:

Code: Select all

result.Highlight(match, -1000)

with negative delay, because the positive delay just keeps cycling all words and they are not highlighted at the same time:

Error: Invalid callback function.

OCR-main\OCR-main\Lib\OCR.ahk
284: Else
284: If showTime < 0
▶ 285: SetTimer(this.GetMethod("Highlight"), -Abs(showTime))
286: Return this
287: }

but keep getting the error above (( Does anyone know how I could implement such feature in my project?

Descolada · Post by **Descolada** » 09 Sep 2023, 02:13

@Avasilev, when I was writing the Highlight method I thought whether anyone would need this functionality (highlighting multiple elements at different points in time and though that they didn't, thus didn't implement it. Obviously I was wrong, but I have now pushed a fix for it among with other improvements to Highlight

For example, the following highlights all found matches from the Desktop at the same time:

Code: Select all

result := OCR.FromDesktop()
for line in result.Lines
    result.Highlight(line, -5000)
Sleep 2000
ExitApp

Post by **Avasilev** » 09 Sep 2023, 13:51

Descolada wrote: ↑
09 Sep 2023, 02:13
@Avasilev, when I was writing the Highlight method I thought whether anyone would need this functionality (highlighting multiple elements at different points in time and though that they didn't, thus didn't implement it. Obviously I was wrong, but I have now pushed a fix for it among with other improvements to Highlight

For example, the following highlights all found matches from the Desktop at the same time:
Code: Select all
result := OCR.FromDesktop()
for line in result.Lines
    result.Highlight(line, -5000)
Sleep 2000
ExitApp

Thanks for that that was really helpfull. To my mind (being a newbie in programming) the importance of highlightning function should not be underestimated

When I learn to code - I firstly make visual "console.logs" all over the screen, highligtning and blinking with all logical variables and executed fuctions

Code: Select all

`:: {
loop {
    try result := OCR.FromWindow("Untitled - Notepad", , 1)
    for String_1 in result.FindStrings("test", , , { x1: 100, y1: 100, x2: 1000, y2: 1000 })
    result.Highlight(String_1, -500)
}
}

I found out that to make OCR real-time and lag-free, the ideal OCR.FromWindow mode is 1 and .Highlight delay is -500 (after that it starts blinking). I narrowed down the .FindStrings area to make it faster, however I assume that OCR.FromWindow also slows the process by taking full screenshots of window... Is it possible to obtain result .FromWindow in spesific area?

Descolada · Post by **Descolada** » 12 Sep 2023, 14:03

@Avasilev, narrowing FindStrings area shouldn't make the search too much faster, because most of the slow operations are capturing the screenshot and the OCR itself, whereas the FindStrings narrowing simply ignores results which coordinates are located in that area. I added a way to partially OCR a window as you asked, where now it's possible to specify an object with coordinates for argument onlyClientArea, Example7 demonstrates the usage. So the FindStrings equivalent should be OCR.FromWindow("Untitled - Notepad", , {x:100, y:100, w:900, h:900}). Note that you aren't using "mode" 1 with FromWindow, it's actually the "scale" factor (a bigger scale factor scales up the image and the OCR will take longer, yet be more accurate).

Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR

Re: Easy OCR