Post
by Descolada » 16 Jul 2023, 00:43
@FanaticGuru, you are correct that in some cases pre- and postprocessing might improve results. But this, at least in my practice, isn't generalizable to all applications of OCR. For example, if you are dealing with white text on black then you might get a lot better performance from the OCR if you reverse the colors of the image, but this might break OCR in all other cases.
If you want maximum image recognition quality you could improve all the parts in the OCR process: preprocessing, processing, and postprocessing. Which of these will work or give best results depends on many factors such as quality of the image, amount of noise, text language etc.
Preprocessing: processing the image for best quality before sending it in the OCR engine
1) Use better quality input images. If reading the input image from a file (jpg, png), then if you can increase the original resolution and decrease compression then the OCR engine will have more information to work with and will give better results. This is not applicable to FromDesktop, FromWindow etc.
2) Converting to black-and-white as @FanaticGuru mentioned is a good technique to use, and is built-in in many OCR preprocessors. I've used it in previous projects to good results, but those were cases where the image was supposed to be black-and-white in the first place. In such cases this works as a denoiser and might improve recognition accuracy. Whereas if you have a color image (eg pink text on yellow background) this might again totally break the recognition (e.g. converting both background and text to white...). Whether this is already integrated in UWP I don't know.
3) Deskewing - rotating the image to be straight. I believe UWP OCR does this internally since it returns the TextAngle property, but using a custom algorithm to do this might again give better results.
4) Sharpening, as mentioned by @malcev, is another technique to use. You could increase the contrast, use algorithms to remove salt-and-pepper noise, etc.
5) Line removal: removing straight lines from the input will reduce overall noise and should improve recognition accuracy.
OCR engine selection is also important. Different OCR engines differ in the type and size of the neural network, size of the training data, etc. Tesseract is one of the better ones to use, but UWP OCR is built-in Windows so it's easier to use and will make your script more compact size-wise (don't have to download a separate engine).
Also, usually there are different models for different languages, so if you are not using English you should use an appropriate one.
Postprocessing the text is the final step. This would mean correcting mistakes in the outputted text by using word banks to remove typos (eg "imege" -> "image"), using n-grams or Markov chains to predict next words, using Levenshtein distance etc. There are many methods to choose from which you can find by Googling "OCR postprocessing methods".
I would say that most of these techniques are rather advanced, and I probably wouldn't use AHK to achieve these, mostly because the amount of available libraries for image and text processing is still small, and image processing might require a lot of computation which AHK isn't suitable for. Python or C++ might be better options for this.