PaddleOCR - probably the best OCR tool available

newcod3r · Post by **newcod3r** » 13 Apr 2022, 20:21

unfortunately got this error after including btt lib though:

: image.png (6.68 KiB) Viewed 3331 times

Post by **boiler** » 13 Apr 2022, 21:44

Your AHK version must be older than v1.1.31 when Switch was introduced. Update to the latest version.

newcod3r · Post by **newcod3r** » 14 Apr 2022, 03:19

Thank you!

kairushinjuu · Post by **kairushinjuu** » 11 May 2022, 02:23

Just noticed that paddleocr is now v3 any chance this will also be updated?

crocodile · Post by **crocodile** » 02 Sep 2022, 13:39

Can this library be updated to version 2.6.0? This version seems to have a lot of improvements. Thanks.

TJ99 · Post by **TJ99** » 26 Sep 2022, 14:20

Hey there,
thanks for this awesome tool!

Everything works as expected here, but OCR on the mouse cursor. It is always empty, when I use that. All the other return the correct values.

iseahound · Post by **iseahound** » 28 Sep 2022, 19:19

I think tuzi included that for fun. Unless your mouse cursor happens to be an image with text in it:

Code: Select all

; Set your cursor to the AutoHotkey logo
ImagePutCursor("https://www.autohotkey.com/static/ahk_logo.png")

you will never get a result with OCR(A_Cursor)

TJ99 · Post by **TJ99** » 10 Oct 2022, 11:02

iseahound wrote: ↑
28 Sep 2022, 19:19
I think tuzi included that for fun. Unless your mouse cursor happens to be an image with text in it:
Code: Select all
; Set your cursor to the AutoHotkey logo
ImagePutCursor("https://www.autohotkey.com/static/ahk_logo.png")
you will never get a result with OCR(A_Cursor)

Oh, that is unfortunate.

Do you know perhaps, if it would be possible to OCR just a certain area on the screen then?
I've seen it is possible to OCR entire screens or single files, but could not find a way to analyze e.g. between x,y coordinates 100, 100 and 1000, 1000.

Mannaia666 · Post by **Mannaia666** » 03 Jan 2023, 05:28

Hello There, is there someone who can explain me briefly how to set the PaddleOcr to work with italian language?
Thanks!

Flowgun · Post by **Flowgun** » 23 Mar 2023, 20:01

I fixed the "WhiteSpace" problem. I tried changing the inference files with v3, but that didn't work. the DLLs need to be updated too.
This link seems to have the necessary DLLs for v3: https://gitee.com/raoyutian/paddle-ocrsharp
For different languages, I think it would be relatively easy to keep each language in its own folder, and add a parameter to the function that checks if a specific language was specified to use that specific language.
A GUI could be made that checks available folders and create a dropdown menu from them to send to that function for OCR. This way, adding a new language becomes as simple as downloading its files and unzipping.
I might explore these options one day, but I feel that I already spent too much time on it.

Here's the updated function for the "PaddleOCR.Ahk" file (working with whitespace). It also has the comments translated to English

Code: Select all

PaddleOCR(Image, Configs:="")
{
	static hModule, model, get_all_info, LastConfigs, DllPath := A_LineFile "\..\Dll"

    ; Verify running version
    if (A_PtrSize!=8)
    {
        MsgBox, 0x40010, , PaddleOCR must run on x64.
        ExitApp
    }

    
    ; The configuration file is generated when the first run or the value of Configs is passed in
    if (!hModule or IsObject(Configs))
    {
        ; Supported Configs options
        model                        := NonNull_Ret(Configs.model                       , model="" ? "server" : model)
        get_all_info                 := NonNull_Ret(Configs.get_all_info                , 0)
        
        use_gpu                      := NonNull_Ret(Configs.use_gpu                     , 0) ; Using the GPU requires installing CUDA toolkit: (2.6+GB) https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal
        gpu_id                       := NonNull_Ret(Configs.gpu_id                      , 0)
        gpu_mem                      := NonNull_Ret(Configs.gpu_mem                     , 4000)
        cpu_math_library_num_threads := NonNull_Ret(Configs.cpu_math_library_num_threads, 10)
        use_mkldnn                   := NonNull_Ret(Configs.use_mkldnn                  , 0)   ; CPU acceleration with AVX2 instructions. this requires AVX2 supporting CPU (intel 4th gen and/or higher?)
        
        max_side_len                 := NonNull_Ret(Configs.max_side_len                , 960)
        det_db_thresh                := NonNull_Ret(Configs.det_db_thresh               , 0.5)
        det_db_box_thresh            := NonNull_Ret(Configs.det_db_box_thresh           , 0.5)
        det_db_unclip_ratio          := NonNull_Ret(Configs.det_db_unclip_ratio         , 2.2)
        use_polygon_score            := NonNull_Ret(Configs.use_polygon_score           , 1)
        
        use_angle_cls                := NonNull_Ret(Configs.use_angle_cls               , 0)
        cls_thresh                   := NonNull_Ret(Configs.cls_thresh                  , 0.9)
        
        visualize                    := NonNull_Ret(Configs.visualize                   , 0)
        
        use_tensorrt                 := NonNull_Ret(Configs.use_tensorrt                , 0)
        use_fp16                     := NonNull_Ret(Configs.use_fp16                    , 0)
        
        ; Use faster or more accurate models
        model          := (model="fast" or model="mobile") ? "mobile" : "server"
        cls_model_dir  := % """" DllPath "\inference\mobile_cls\" """"
        det_model_dir  := % """" DllPath "\inference\" model "_det\" """"
        rec_model_dir  := % """" DllPath "\inference\" model "_rec\" """"
        char_list_file := % """" DllPath "\inference\dict.txt" """"
        
        ; config.txt template
        template=
        (LTrim
        use_gpu %use_gpu%                                             # Whether to use GPU. 1 means use, 0 means not use. I think this requires an nvidia GPu and installing the CUDA toolkit (2.6+GB): https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal
        gpu_id  %gpu_id%                                              # GPU id. Effective when using GPU.
        gpu_mem  %gpu_mem%                                            # The requested GPU memory.
        cpu_math_library_num_threads  %cpu_math_library_num_threads%  # The number of threads at the time of CPU prediction. When the number of machine cores is sufficient, the larger the value, the faster the prediction speed.
        use_mkldnn %use_mkldnn%                                       # Whether to use mkldnn library (for CPU acceleration). 1 means use, 0 means not use. requires AVX2 compatible CPU

        max_side_len  %max_side_len%                                  # If the length and width of the input image are greater than n, the image is scaled proportionally so that the longest side of the image is n.
        det_db_thresh  %det_db_thresh%                                # Used to filter the binarized image of DB prediction. Setting it to 0.-0.3 has no obvious effect on the result.
        det_db_box_thresh  %det_db_box_thresh%                        # DB post-processing filter box threshold. If there is a missing frame in the detection, it can be reduced as appropriate.
        det_db_unclip_ratio  %det_db_unclip_ratio%                    # Indicates how tight the text box 
        use_polygon_score %use_polygon_score%                         # Whether to use the polygon box to calculate the bbox score. 0 means using rectangular box calculation. The calculation speed of the rectangular box is faster, and the calculation of the polygonal box is more accurate for the curved text area.
		
        det_model_dir  %det_model_dir%                                # Check the location of the model.

        use_angle_cls %use_angle_cls%                                 # Whether to use the direction 
        cls_model_dir  %cls_model_dir%                                # The location of the direction 
        cls_thresh  %cls_thresh%                                      # The score threshold of the direction classifier.

        rec_model_dir  %rec_model_dir%                                # Identify the location of the model
        char_list_file  %char_list_file%                              # The location of the dictionary 

        visualize %visualize%                                         # Whether to visualize the results. When it is 1, the visual prediction result with the file name ocr_vis.png will be saved in the main code folder.

        use_tensorrt %use_tensorrt%                                   # Whether to use tensorrt.
        use_fp16 %use_fp16%                                           # Whether to use fp16.
        )
        if (template!=LastConfigs)
        {
            LastConfigs := template
            NeedToInit  := 1
        }
    }
    
    ; The default location of the sub-Dll that the search Dll depends on is the directory when the main code is running (this directory is invalid to change through SetWorkingDir).
    ; So if the main code and the 'dll file with sub-dependence' are not in the same directory, then you need to specify the location, otherwise an error will be reported that the Dll cannot be found.
    ; 3 methods.
    ; 1 is SetDllDirectory.
    ; 2 is LoadLibraryEx uses absolute path and adds LOAD_WITH_ALTERED_SEARCH_PATH option.
    ; 3 is to load all sub-dependent Dlls through LoadLibrary in advance.
    ; Because of LoadLibrary's feature of avoiding repeated loading based on file name.
    ; For example, LoadLibrary('c:a.dll') and then LoadLibrary('d:somedira.dll') get Dll in drive c.
    ; So method 3 is excluded here and method 1 is used.
    if (!hModule)
    {
        DllCall("SetDllDirectory", "str", DllPath)
        hModule := DllCall("LoadLibrary", "str", DllPath "\PaddleOCR.dll")
    }
    
    ; Setting changes require reinitialization
    if (NeedToInit)
    {
        DllCall("PaddleOCR\destroy")
        VarSetCapacity(config, StrPut(template, "cp0"))
        StrPut(template, &config, "cp0")
        DllCall("PaddleOCR\load_config", "str", config)
    }
    
    ; Load image into memory
    pStream := ImagePutStream(Image)
    DllCall("ole32\GetHGlobalFromStream", "ptr", pStream, "ptr*", hMemory)
    pMemory := DllCall("GlobalLock", "ptr", hMemory, "ptr")
    pSize   := DllCall("GlobalSize", "ptr", hMemory, "uptr")
    
    ; Whether to return all information including the recognized content, confidence and coordinates (JSON format)
    str := DllCall("PaddleOCR\ocr_from_binary", "ptr", pMemory, "int", pSize, "int", get_all_info, "str")
    
     ; Release memory resources
    DllCall("GlobalUnlock", "ptr", hMemory)
    DllCall("GlobalFree", "ptr", hMemory)
    ObjRelease(pStream)
    
    ; Fix the problem that JSON cannot be parsed due to wrong score
    if (get_all_info)
    {
        wrongChars = ,"score":-nan(ind),"range"
        rightChars = ,"score":-1,"range"
        str := StrReplace(str, wrongChars, rightChars)
        ; Fix the problem that str is empty and reports an error
        return, str="" ? "" : JSON.Load(str)
    }
    
    return, str
}

#Include %A_LineFile%\..\Lib\ImagePut.ahk
#Include %A_LineFile%\..\Lib\NonNull.ahk
#Include %A_LineFile%\..\Lib\JSON.ahk

Flowgun · Post by **Flowgun** » 23 Mar 2023, 20:43

Info in this link is helpful for updating "PaddleOCR.Ahk" to support latest version, but it's a bit over my head (although it seems to make things a lot easier and cleaner):
https://github.com/sdcb/PaddleSharp/blob/master/docs/ocr.md
Now we only need the hero that would put all these things together.

Galakrond · Post by **Galakrond** » 09 May 2023, 03:03

I have a few questions for its use,

I might have misunderstood. If so, I'm sorry.

For example, I will search for an image or pixel at the coordinates 200,200,200,200, and when it is found, I want the 1 key to be pressed.

I have 10 different images that I want to be searched at coordinates 200,200,200,200. Can I specify these images in the script and press the key I want quickly when the pictures appear at the specified coordinate?

Thanks for any help.

Post by **boiler** » 09 May 2023, 03:49

@Galakrond — Why would you use an OCR package for that? Do you know what OCR is?

Also, what is 200,200,200,200 supposed to be? A search rectangle? The only image that can be found in that rectangle is a size of one pixel.

iseahound · Post by **iseahound** » 12 May 2023, 23:25

Yep. This library uses x, y, w, h, so you can do: OCR([200, 200, 200, 200])

Post by **boiler** » 13 May 2023, 00:32

@iseahound - Good point, but I don't think he was even thinking about this library or even understands what it is or even what OCR is in general because said he's looking for an image or pixel, not trying to read text. He spammed a bunch of different threads asking the same thing (some disapproved for being totally off topic), and in the other ones (example) he was using 200, 200, 200, 200 as the search rectangle for PixelSearch, which is based on a X1, Y1, X2, Y2. He eventually said to ignore those coordinates.

terryys · Post by **terryys** » 20 Jul 2023, 02:52

Hi, I tried to read text from an image in a game. The image size is about 500x30 pixels, and contains 5 words. Somehow the library only recognize part of the first word and the last word. If I crop the image, e.g. only contains the first two words, it can get the two words correctly. However, if I include three words, it gets wrong results again. What should I do so that it can read all the words correctly?

AutoHotkey Community

PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available

Re: PaddleOCR - probably the best OCR tool available