Does anyone have a good OCR solution with V2?

Get help with using AutoHotkey (v2 or newer) and its commands and hotkeys
Dgls
Posts: 12
Joined: 11 Jan 2023, 21:53

Does anyone have a good OCR solution with V2?

14 Jan 2023, 02:47

My use case is very straight forward. I want to locate very clear and specific text labels in the GUI of an app. They are quite limited in number and where they can appear, size and font is fixed.
I'm really hoping for something that works with V2 test scripts out of the box. I'm working with a 3rd monitor located placed above and to the right of the primary one (negative Y coordinates) which seems to make various AHK primitive functions not work (have to do DllCall, etc), so I'm wary of this, hoping for something that works o.k. without too much r&d on this...
Thanks for any hints.
Dgls
Posts: 12
Joined: 11 Jan 2023, 21:53

Re: Does anyone have a good OCR solution with V2?

14 Jan 2023, 21:44

I've just tested Vis2 which seems to work. Problem is it's written in V1.
I'm new to AHK and it seems only reasonable to do all my coding in V2 which is a more conventional and orthogonal language wise, than V1.
V1 seems very idiosyncratic and ad hoc. I'd really rather not deal with it, if possible.
Problem is so examples and existing code it written in it....
User avatar
mikeyww
Posts: 27324
Joined: 09 Sep 2014, 18:38

Re: Does anyone have a good OCR solution with V2?

14 Jan 2023, 22:15

A fantastic learning opportunity! You can translate the script and then post it! :)

I would also like to ask a small favor. If you can translate Gdip_All and also JSON.ahk at the same time, it would be great, even fantastic.

:roll:
Dgls
Posts: 12
Joined: 11 Jan 2023, 21:53

Re: Does anyone have a good OCR solution with V2?

15 Jan 2023, 02:14

thqby wrote:
14 Jan 2023, 22:23
https://github.com/thqby/ahk2_lib/tree/master/RapidOcr

This is a local ocr, which uses CPU for reasoning.
Thanks, looks very promising. Is there a simple usage example(s) anywhere?
User avatar
thqby
Posts: 433
Joined: 16 Apr 2021, 11:18
Contact:

Re: Does anyone have a good OCR solution with V2?

15 Jan 2023, 07:50

At the back of the source code.
Dgls
Posts: 12
Joined: 11 Jan 2023, 21:53

Re: Does anyone have a good OCR solution with V2?

15 Jan 2023, 20:12

Great, the jpg file ocr test works for me!
I want to ocr parse the app window to determine the gui layout state before I blindly send mouse clicks to it.
This will involve capturing smaller rectangles to test their contents.
Does RapidOcr have screen grab capabilities or would ank2_lib/wincapture be the way to do it?
btw; Thanks for the library.
Allen
Posts: 20
Joined: 29 Jan 2019, 10:01

Re: Does anyone have a good OCR solution with V2?

14 Mar 2024, 04:40

I had a similar situation last year when I needed to automate some tasks in an app that involved recognizing specific text labels. The regular tools I tried weren't cutting it, especially because my setup also included a multi-monitor arrangement with quirky coordinates. After a bit of searching and trial and error, I landed on an OCR solution that worked wonders for my project.

What really did the trick was using ID Analyzer's OCR Visual Data Scanning technology. Their advanced OCR could accurately pick up text from the GUI of my app without the fuss. It wasn't just the accuracy that impressed me, but how it handled different languages and tricky documents without any hitches. This made integrating into my V2 test scripts a breeze. Plus, their solution didn't require any complex setup to work with my monitor layout, which saved me a ton of R&D time. For anyone diving into similar projects, checking out their Identity Verification services could be a solution.
User avatar
boiler
Posts: 17346
Joined: 21 Dec 2014, 02:44

Re: Does anyone have a good OCR solution with V2?

14 Mar 2024, 06:25

@Allen — Would you like to share the details of how you implemented their product into your AHK code? Is their OCR package standalone? Is it free?
iseahound
Posts: 1469
Joined: 13 Aug 2016, 21:04
Contact:

Re: Does anyone have a good OCR solution with V2?

14 Mar 2024, 11:58

Yep. The fact is paid APIs offer a much more compelling OCR solution than what's possible on the local system. You could also create an esemble of APIs, where you use about 3 services, and mix those results up for a better truth model.

If you're asking about integration, there's only 3 ways image data can be uploaded. Either use ImagePutBase64, ImagePutSafeArray, or CreateFormData.

Return to “Ask for Help (v2)”

Who is online

Users browsing this forum: docterry, Draken, Insaid, reddyshyam and 19 guests