Vis2 - Image to Text OCR()

paulpma · 19 May 2020, 01:22

@AHKMode
use

 MouseClickDrag

https://www.autohotkey.com/docs/commands/MouseClickDrag.htm

or

Code: Select all

SendEvent {Click 6, 52, down}{click 45, 52, up}

or

Code: Select all

text:=OCR(x,y,x1,y1)

; see help file.. (last example most likely to work)

AHKMode · 20 May 2020, 01:06

Thanks for the reply @paulpma

^w::
text := OCR()
Sleep, 1000
SendEvent {Click 6, 52, down}{click 45, 52, up}
Sleep, 1000
MsgBox, % text

this is my script and not working.. the line SendEvent {Click 6, 52, down}{click 45, 52, up} was execute after i click left of the mouse button. i want to know what command should i do to make the OCR() function and mouseclickdrag method will be together. I've search on the google and i cannot compose the script of OCR() with somethings line like controlclick or controlsend or control.. because i want to know what should i put on the script combined the OCR() function and mouseclickdrag/sendevent..

hopefully you understand me. thanks again!

paulpma · 22 May 2020, 08:50

Hi, @AHKMode
I have tested your given script and getting same results.... I think its not working because of the way VIS script was designed... Have you tried the third method??

paulpma · 22 May 2020, 09:03

I am using GDIP with VIS and other application as well and getting an error. This happends all the time when after other application has used GDIP library and when i am to manually selecting text in VIS.. Error occurs immediately after selection. See error image for details. The line of error in GDIP is the following:

Code: Select all

DllCall("gdiplus\GdipGetImageEncodersSize", "uint*", nCount, "uint*", nSize)

which is part of function:

Code: Select all

Gdip_SaveBitmapToFile(pBitmap, sOutput, Quality:=75)

Any idea what can cause this issue?

This is the code that is executed prior to using VIS..

Code: Select all

fileName :=  A_YYYY "-" A_MM "-" A_DD "-" A_Hour "-" A_Min "-" A_Sec ".png"
pBitmap := Gdip_Startup()
pBitmap := Gdip_BitmapFromScreen()
saveFileTo := fileName                   
Gdip_SaveBitmapToFile(pBitmap, saveFileTo)
Gdip_DisposeImage(pBitmap)
Gdip_Shutdown(pBitmap)

the whole Gdip_SaveBitmapToFile function.

Code: Select all

Gdip_SaveBitmapToFile(pBitmap, sOutput, Quality:=75)
{
	Ptr := A_PtrSize ? "UPtr" : "UInt"
	nCount := 0
	nSize := 0
	_p := 0

	SplitPath sOutput,,, Extension
	if !RegExMatch(Extension, "^(?i:BMP|DIB|RLE|JPG|JPEG|JPE|JFIF|GIF|TIF|TIFF|PNG)$")
		return -1
	Extension := "." Extension

	DllCall("gdiplus\GdipGetImageEncodersSize", "uint*", nCount, "uint*", nSize)
	VarSetCapacity(ci, nSize)
	DllCall("gdiplus\GdipGetImageEncoders", "uint", nCount, "uint", nSize, Ptr, &ci)
	if !(nCount && nSize)
		return -2

	If (A_IsUnicode){
		StrGet_Name := "StrGet"

		N := (A_AhkVersion < 2) ? nCount : "nCount"
		Loop %N%
		{
			sString := %StrGet_Name%(NumGet(ci, (idx := (48+7*A_PtrSize)*(A_Index-1))+32+3*A_PtrSize), "UTF-16")
			if !InStr(sString, "*" Extension)
				continue

			pCodec := &ci+idx
			break
		}
	} else {
		N := (A_AhkVersion < 2) ? nCount : "nCount"
		Loop %N%
		{
			Location := NumGet(ci, 76*(A_Index-1)+44)
			nSize := DllCall("WideCharToMultiByte", "uint", 0, "uint", 0, "uint", Location, "int", -1, "uint", 0, "int",  0, "uint", 0, "uint", 0)
			VarSetCapacity(sString, nSize)
			DllCall("WideCharToMultiByte", "uint", 0, "uint", 0, "uint", Location, "int", -1, "str", sString, "int", nSize, "uint", 0, "uint", 0)
			if !InStr(sString, "*" Extension)
				continue

			pCodec := &ci+76*(A_Index-1)
			break
		}
	}

	if !pCodec
		return -3

	if (Quality != 75)
	{
		Quality := (Quality < 0) ? 0 : (Quality > 100) ? 100 : Quality
		if RegExMatch(Extension, "^\.(?i:JPG|JPEG|JPE|JFIF)$")
		{
			DllCall("gdiplus\GdipGetEncoderParameterListSize", Ptr, pBitmap, Ptr, pCodec, "uint*", nSize)
			VarSetCapacity(EncoderParameters, nSize, 0)
			DllCall("gdiplus\GdipGetEncoderParameterList", Ptr, pBitmap, Ptr, pCodec, "uint", nSize, Ptr, &EncoderParameters)
			nCount := NumGet(EncoderParameters, "UInt")
			N := (A_AhkVersion < 2) ? nCount : "nCount"
			Loop %N%
			{
				elem := (24+(A_PtrSize ? A_PtrSize : 4))*(A_Index-1) + 4 + (pad := A_PtrSize = 8 ? 4 : 0)
				if (NumGet(EncoderParameters, elem+16, "UInt") = 1) && (NumGet(EncoderParameters, elem+20, "UInt") = 6)
				{
					_p := elem+&EncoderParameters-pad-4
					NumPut(Quality, NumGet(NumPut(4, NumPut(1, _p+0)+20, "UInt")), "UInt")
					break
				}
			}
		}
	}

	if (!A_IsUnicode)
	{
		nSize := DllCall("MultiByteToWideChar", "uint", 0, "uint", 0, Ptr, &sOutput, "int", -1, Ptr, 0, "int", 0)
		VarSetCapacity(wOutput, nSize*2)
		DllCall("MultiByteToWideChar", "uint", 0, "uint", 0, Ptr, &sOutput, "int", -1, Ptr, &wOutput, "int", nSize)
		VarSetCapacity(wOutput, -1)
		if !VarSetCapacity(wOutput)
			return -4
		_E := DllCall("gdiplus\GdipSaveImageToFile", Ptr, pBitmap, Ptr, &wOutput, Ptr, pCodec, "uint", _p ? _p : 0)
	}
	else
		_E := DllCall("gdiplus\GdipSaveImageToFile", Ptr, pBitmap, Ptr, &sOutput, Ptr, pCodec, "uint", _p ? _p : 0)
	return _E ? -5 : 0
}

My knowldge in AHK is not that advanced to troubleshoot this, anyone can point me out into right direction? So far I have updated my GDIP library to 1.54.. Note this occurs on WIN 7.

iseahound · 22 May 2020, 15:04

Don't call Gdip_Shutdown() and see if that works.

I wrote a new image library to do what you're doing now

paulpma · 22 May 2020, 23:23

iseahound wrote: ↑
22 May 2020, 15:04
Don't call Gdip_Shutdown() and see if that works.

Yes, it worked.. I am just wondering what happened in background that created this issue?

I wrote a new image library to do what you're doing now

That is awesome that you can do that. I am happy for you. I am wondering why write another one if GDIP exists? My level is no where near of creating library like that or working with DLL calls, but I am learning slowly..

Thank you Iseahound for your help. I really appreciate it.

18 Jun 2020, 19:50

Hey, is there any way to set the array of OCR but relative to the specific window?
Something like OCR([X,Y,W,H], "Notepad") or so.
Or maybe set CoordMode to Window?

22 Jun 2020, 18:57

stfur wrote: ↑
18 Jun 2020, 19:50
Hey, is there any way to set the array of OCR but relative to the specific window?
Something like OCR([X,Y,W,H], "Notepad") or so.
Or maybe set CoordMode to Window?

I think the use of screen coordinates is baked into the GDip library used by Vis2. Your best bet is probably to do something like this:

Code: Select all

WindowOCR(x1,y1,x2,y2, windowName)
{
	WinActivate, %windowName%
	WinGetPos, WinX,WinY,,,,,,
	x := WinX + x1
	y := WinY + y1
	w := x2-x1
	h := y2-y1
	coords := [x,y,w,h]
	return OCR(coords)
}

andy5566888 · 23 Jun 2020, 04:40

i have some problem i have a photo the photo is the number one in photo but the backgroud have many color, and i use ocr but it can't find any text will ouput NULL
i want to know how to ocr the number one photo

RedRaccoon · 21 Aug 2020, 09:22

stfur wrote: ↑
18 Jun 2020, 19:50
Hey, is there any way to set the array of OCR but relative to the specific window?
Something like OCR([X,Y,W,H], "Notepad") or so.
Or maybe set CoordMode to Window?

Was looking for something similar, but as tom098656 solution didn't work for me(cuz I suck at coding) I used WinActivate.
Activate desired window, run code, then activate your initial window. Slight delay based on how big an area you need to read.

Code: Select all

Q::  
if WinExist("ahk_exe BlueStacks.exe")
    WinActivate ; use the window found above
	OCR("A", ,[77, 682, 741, 800]).clipboard()
WinExist("ahk_exe firefox.exe") ; back to firefox
    WinActivate ; use the window found above
return

Albireo · 24 Aug 2020, 15:13

I think the project looks exciting.
Analyzing an area on an image seems to work well (have not tested yet).

Will the result be improved with a large, high-resolution image?
How to analyze an area that is larger than the monitor?

What I miss / want is to be able to analyze fields / columns from a PDF file. (one or many pages) Will it be possible?

24 Aug 2020, 18:43

This looks super awesome, any plans to add multi-monitor support for the coordinates? Every time I add coordinates that are off my main monitor screen I get an error "Could not find source image"

24 Aug 2020, 18:58

Albireo wrote: ↑
24 Aug 2020, 15:13
What I miss / want is to be able to analyze fields / columns from a PDF file. (one or many pages) Will it be possible?

Afaik, there are third-party pdf-to-text tools available on the interwebs (but I haven't use one (at least not recently), I don't know how they'd handle pdfs with restricted rights), which do not rely on OCR, I think.
OCR should perhaps serve as a last resort for this use case.

Albireo · 27 Aug 2020, 06:33

The little I have tested, Vis2 seems to interpret text very well.
But how do I automatically select an area to convert to text in a .png file?
Is it possible to graphically select the area of an image, and then use the result in an AHK program?
(Something I missed?)

iseahound · 31 Aug 2020, 09:03

I'm sure you can do something like: OCR("mypic.png", , [0,0,100,100]) where the array is the part of the image to crop.

Albireo · 01 Sep 2020, 07:11

iseahound wrote: ↑
31 Aug 2020, 09:03
I'm sure you can do something like: OCR("mypic.png", , [0,0,100,100]) where the array is the part of the image to crop.

Yes! (thanks)
It works in some way, but… I have a problem with the coordinates.
How to get the coordinates? (Window Spy doesn't show me the right values)
When I run the OCR-program like this .:

Code: Select all

FileAppend % OCR(imgFile, "swe", [200,700,100,75]), %resFile%

It scans a surface elsewhere. (I never open the picture before OCR...)

iseahound · 01 Sep 2020, 09:01

If you just want to OCR an image:

Code: Select all

MsgBox % OCR("text.jpg")

If you want to crop out some spaces [x, y, w, h] where x,y are the top left coordinate of the image, and w,h are the width and height of the image to keep. Open the image in an editor like paint to view the x,y coordinates of your image.

Code: Select all

MsgBox % OCR("text.jpg", "swe", [x,y,w,h])

Botsy · 01 Sep 2020, 10:43

Hi all, maybe u can help me with this:

how to specify coordinates in ocr () obtained from the position of the mouse on the button?
In general, I want to make such a script: with the mouse we put on the desired area, by clicking the button, the coordinates are written to variables. I use these variables for ocr ([]) as an area for constant scanning. And then I compare the predetermined value with what happened in the ocr area.

Code: Select all

 
#include <Vis2> 

HS = 2
F11::

Highlight(HS)

Highlight(TH)
{
	local
  if (x="")
  {
    VarSetCapacity(pt,16,0), DllCall("GetCursorPos","ptr",&pt)
    x:=NumGet(pt,0,"uint"), y:=NumGet(pt,4,"uint")
  }
  x:=Round(x), y:=Round(y)

   w = 25
   h = 25
   Gui, +LastFound +ToolWindow -Caption +AlwaysOnTop
   Gui, Color, Red
   Gui, Show, x%X% y%Y% w%W% h%H% Hide
   Options := "0-0 " W "-0 " W "-" H " 0-" H " 0-0 " TH "-" TH
      . " " W-TH "-" TH " " W-TH "-" H-TH " " TH "-" H-TH " " TH "-" TH
   WinSet, Region, % Options
   Gui, Show, NA
   KeyWait, F11
   Gui, Destroy
}

text := OCR([%x%, %y%, %w%, %h%])
msgBox, % text

return

Esc:: ExitApp

in this script, by pressing f11 I highlight the area where the cursor is located, then I get the coordinates and write them to the variable

: test.jpg (4.25 KiB) Viewed 3834 times

Albireo · 02 Sep 2020, 08:15

iseahound wrote: ↑
01 Sep 2020, 09:01
… If you want to crop out some spaces [x, y, w, h] where x,y are the top left coordinate of the image, and w,h are the width and height of the image to keep. Open the image in an editor like paint to view the x,y coordinates of your image.

I had perceived it that way (in the end)
Divided a row into several fields (columns), but one field - the quantity field, is not OCR - interpreted correctly.
In that area there are e.g. 10.0 (or 24.0 or ..) but Vis2 reads 10. or 24.
And when the area have 6.0 or 8.0 i got nothing (empty result)
Right now I have no explanation for this result.

Another thing I think about is the time Vis2 needs to interpret the content.
Right now each row takes about 4 seconds. (20 rows = 80 seconds, 100rows = 400 seconds)

OCR Invoice - My test program (unfortunately are the comments in Swedish)

Code: Select all

; Version 2 sept 2020
; 
;
#NoEnv
SetBatchLines -1
#SingleInstance force

#include <Vis2>  ; Equivalent to #include .\lib\Vis2.ahk

;  - - -  S k a p a   a r a y e r  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
; Initiera Support information
Support := []
Support.namn := "... ..."
Support.epost := "jan@......se"
Support.telefon := "0123 1234567"

imgPath := A_ScriptDir
imgName := "Condaco1.png"
imgFile := imgPath "\" imgName
If !FileExist(imgFile)
{	MsgBox 16, Row .: %A_LineNumber% -> %A_ScriptName%,
	( LTrim 
	 %	"PDF-fil saknas!
		`tFilnamn .: " imgName "
		Sökväg .: " imgPath "`n
		Kontakta .: " Support.namn " - " Support.telefon "`n
		Detta program kommer att avslutas!"
	)
	MsgBox ,,, Programmet avslutas!, 1
	ExitApp
}


; resPath := "C:\Users\Personal\Documents\Expo\Autohotkey\2- Projekt Faktura"
resPath := A_ScriptDir
resName := "Condaco1.txt"
resFile := resPath "\" resName
If FileExist(resFile)
{	FileDelete %resFile%
	If ErrorLevel
	{	MsgBox 16, Row .: %A_LineNumber% -> %A_ScriptName%,
		( LTrim 
		 %	"Filen kunde inte raderas.
			`tFilnamn .: " resName "
			Sökväg .: " resPath "`n
			Kontakta .: " Support.namn " - " Support.telefon "`n
			Detta program avslutas!"
		)
		MsgBox ,,, Programmet avslutas!, 1
		ExitApp
	}
}


StartTime := A_TickCount

; Anpassad struktur
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Beräkna kolumnbredd (värden från Paint)
StartX := "160"	; Innan första kolumnen
StartY := "747"	; Y-värde (justera ev. prick på "Å" = -6 pkt). Längst till vänster om området
SlutY := "1432"	; Y-värde längst ned till höger i området. (för att beräkna radhöjd)

AntalRader = 29	; st	; Räknas i fakturan

; Kolumnsbredd - värden från Paint
Kolumn1x := "357"		; Efter kolumn 1 - Artikelnummer (X-slut på kol 1)
Kolumn2x := "725"		; Efter kolumn 2 - Benämning (X-slut på kol 2)
Kolumn3x := "820"		; Efter kolumn 3 - Antal(X-slut på kol 3)
Kolumn4x := "980"		; Efter kolumn 4 - á-pris (X-slut på kol 4)
Kolumn5x := "1170"	; Efter sista kolumnen (Radslut) - Belopp (antal * á-pris)

; Beräkna offset för varje kolumn.
1kolOff := Kolumn1x - StartX		; Offset (bredd) kolumn1
2kolOff := Kolumn2x - Kolumn1x	; Offset (bredd) kolumn2
3kolOff := Kolumn3x - Kolumn2x	; Offset (bredd) kolumn3
4kolOff := Kolumn4x - Kolumn3x	; Offset (bredd) kolumn4
5kolOff := Kolumn5x - Kolumn4x	; Offset (bredd) kolumn5

; Beräkna radhöjd
radHöjd := ( SlutY - StartY ) / antalRader
; Produktrader har strukturerats - indelats i rader / kolumner / fält
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

; Skapa en array av artiklar
Artikel := [{}]  ; Creates an array containing an object.

Runtime := []
Rows = 15
Loop % Rows
{	loopTime := A_TickCount
	Artikel[A_Index] := {}  ; Skapa ett objekt i arrayen av artiklar
	
	; OCR(imgFile, "swe", [X, Y, Bredd, Höjd])
	
	Artikel[A_Index].ArtNr	 := OCR(imgFile, "swe", [StartX,   StartY + (radHöjd * (A_Index - 1)), 1KolOff, radHöjd]) ; "200330-550-80"
	Artikel[A_Index].Text 	 := OCR(imgFile, "swe", [Kolumn1x, StartY + (radHöjd * (A_Index - 1)), 2KolOff, radHöjd]) ; "ELLIOT MBLD SVART"
	Artikel[A_Index].Antal 	 := OCR(imgFile, "swe", [Kolumn2x, StartY + (radHöjd * (A_Index - 1)), 3KolOff, radHöjd]) ; "10.0"
	Artikel[A_Index].Pris 	 := OCR(imgFile, "swe", [Kolumn3x, StartY + (radHöjd * (A_Index - 1)), 4KolOff, radHöjd])	; "151.20"
	Artikel[A_Index].Belopp1 := OCR(imgFile, "swe", [Kolumn4x, StartY + (radHöjd * (A_Index - 1)), 5KolOff, radHöjd])	; "151.20"
	Artikel[A_Index].Belopp2 := Artikel[A_Index].Antal * Artikel[A_Index].Pris
	/*
	MsgBox 64, Row .: %A_LineNumber% -> %A_ScriptName%,
	( LTrim Join Comments
	 %	"Radhöjd = " radHöjd " punkter `n`n
		Art.No = " Artikel[A_Index].ArtNr "`n
		Art.Text = " Artikel[A_Index].Text "`n
		Art.Antal = " Artikel[A_Index].Antal "`n
		Art.Pris = " Artikel[A_Index].Pris "`n
		Art.Belopp Avläst = " Artikel[A_Index].Belopp1 "`n
		Art.Belopp Beräkn = " Artikel[A_Index].Belopp2 "`n`n"
		; Körtid .: " RunTime
	)
	*/
	Runtime[A_Index] := (A_TickCount - loopTime) / 1000
	; MsgBox 64, Row .: %A_LineNumber% -> %A_ScriptName%, % "- " Runtime.A_Index
}

Loop % Rows
{	MsgBox 64, Row .: %A_LineNumber% -> %A_ScriptName%,
	( LTrim Join Comments
	 %	"Rad .: " A_Index "`n`n
		Art.No = " Artikel[A_Index].ArtNr "`n
		Art.Text = " Artikel[A_Index].Text "`n
		Art.Antal = " Artikel[A_Index].Antal "`n
		Art.Pris = " Artikel[A_Index].Pris "`n
		Art.Belopp Avläst = " Artikel[A_Index].Belopp1 "`n
		Art.Belopp Beräkn = " Artikel[A_Index].Belopp2 "`n`n
		Radhöjd = " radHöjd " punkter `n`n
		Körtid .: " Runtime[A_Index] " sek.`n
		Total körtid .: " (A_TickCount - StartTime) / 1000 " sek."
	)
}



; MsgBox 64, Row .: %A_LineNumber% -> %A_ScriptName%, % "Radhöjd = " RadHöjd "punkter `n`nArt.No = " Art.No "`nArt.Text = " Art.Text "`nArt.Antal = " Art.Antal "`nArt.Pris = " Art.Pris "`nArt.Belopp Avläst = " Art.Belopp1 "`nArt.Belopp Beräkn = " Art.Belopp2 "`n`nKörtid .: " RunTime
; Vänta till analys utförd.

MsgBox 64, Row .: %A_LineNumber% -> %A_ScriptName%, % "PDF-filen är analyserad och skapad!"

ExitApp

Esc:: ExitApp

iseahound · 02 Sep 2020, 12:26

There are three tessdata trained models:

https://github.com/tesseract-ocr/tessdata_best
https://github.com/tesseract-ocr/tessdata
https://github.com/tesseract-ocr/tessdata_fast

The tessdata_best folder is used when you call OCR() without the GUI.
When using the GUI the tessdata_fast folder is used.

I recommend you replace what is in tessdata_best with the tessdata or tessdata_fast model to get faster performance.

Regarding nothing showing up - you may have to change the x,y,w,h values by a pixel or two - that seems to have a large effect on the final outcome.

Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Re: Vis2 - Image to Text OCR()

Who is online