[SOLVED] char() as hex values are not identified

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

[SOLVED] char() as hex values are not identified

21 Apr 2021, 20:30

I clean text documents created by OCR (optical character recognition) software. I need to convert a series of hexadecimal character values, which appear as single or double quote characters, to the standard double quote chr(0x0022) character. But I am having no luck. Can someone please look at what I am doing wrong?

Code: Select all

; 2021-04-21 21:00

; D:\ahk\en-clean1.ahk

; english 1st text cleaning

; alt+.                        clean text
!.::
critical, on
clipboard:=""
autotrim, on
send, ^a^c

; quotations and accents

in_put = chr(0x1000B4) ; U+00B4 ACUTE ACCENT
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x1002BB) ; MODIFIER LETTER TURNED COMMA
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x1002CA) ; MODIFIER LETTER ACUTE ACCENT
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x1002CB) ; MODIFIER LETTER GRAVE ACCENT
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x1002DD) ; DOUBLE ACUTE ACCENT
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x100300) ; COMBINING GRAVE ACCENT : Greek varia
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x100301) ; COMBINING ACUTE ACCENT : stress mark, Greek oxia, tonos
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x10030B) ; COMBINING DOUBLE ACUTE ACCENT
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x10030F) ; COMBINING DOUBLE GRAVE ACCENT
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x102018) ; LEFT SINGLE QUOTATION MARK : single turned comma quotation mark
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x102019) ; RIGHT SINGLE QUOTATION MARK : single comma quotation mark
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x10201B) ; SINGLE HIGH-REVERSED-9 QUOTATION MARK : single reversed comma quotation mark
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x10201C) ; LEFT DOUBLE QUOTATION MARK : double turned comma quotation mark
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x10201D) ; RIGHT DOUBLE QUOTATION MARK : double comma quotation mark
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

in_put = chr(0x10002A) . chr(0x10002A) ** ASTERISK : star (on phone keypads)
out_put = chr(0x100022)
clipwait, 4
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 4

sendinput, ^v
sendinput, ^{Home}
return
Last edited by ineuw on 24 Apr 2021, 02:12, edited 1 time in total.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: char() as hex values are not identified

21 Apr 2021, 20:42

I might be missing the point completely, but if you want to use expressions rather than literal strings, use :=. Demo is below.

Code: Select all

out_put = chr(0x100022)
MsgBox %out_put%
out_put := chr(0x100022)
MsgBox %out_put%
Almost all of your ClipWait commands will fail (potentially), because you need to set Clipboard to null before you reassign the Clipboard-- each time.
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: char() as hex values are not identified

21 Apr 2021, 20:52

@mikeyww. Thank for the excellent suggestions, I will implement and test them immediately.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: char() as hex values are not identified

21 Apr 2021, 21:06

I don't know whether your hex numbers will have valid output, but here is a general idea to streamline the approach a bit.

Code: Select all

replace := [0x1000B4, 0x1002BB, 0x1002CA, 0x1002CB, 0x1002DD, 0x100300, 0x100301, 0x10030B, 0x10030F
          , 0x102018, 0x102019, 0x10201B, 0x10201C, 0x10201D]
new := Chr(0x100022)

!.::
Clipboard =
Send ^a^c
ClipWait, 0
If ErrorLevel {
 MsgBox, 48, Error, An error occurred while waiting for the clipboard. Aborting.
 Return
} Else ttext := Clipboard
For k, v in replace
 ttext := StrReplace(ttext, Chr(v), new)
Clipboard := "", Clipboard := StrReplace(ttext, Chr(0x10002A) Chr(0x10002A), new)
ClipWait, 0
If ErrorLevel
 MsgBox, 48, Error, An error occurred while waiting for the clipboard. Aborting.
Else Send ^v^{Home}
Return
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: char() as hex values are not identified

21 Apr 2021, 21:44

mikeyww wrote:
21 Apr 2021, 21:06
I don't know whether your hex numbers will have valid output, but here is a general idea to streamline the approach a bit.
Again thanks. I had no luck with the hex values so far. - I altered the script using the "monkey see monkey do" programming principle, and substituted the hexadecimal values with the literal characters when there the script ignored. I copied the unchanged punctuation as the literal value, following your 1st reply recommendations. So far, on a random page test, the following changes work.

Code: Select all

; 2021-04-21 21:00

; D:\ahk\en-clean1.ahk

; english 1st text cleaning

; alt+.                        clean text
!.::
critical, on
clipboard := ""
autotrim, on
send, ^a^c
clipwait, 0

; surround mdash with spaces
in_put := "—"
out_put := " — "
clipwait, 2
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 2

; replace two spaces with one
in_put := A_Space . A_Space
out_put := A_Space
clipwait, 2
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 2

; quotations and accents

; ’’
in_put := "’’"
out_put := chr(0034)
clipwait, 2
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 2

; ’
in_put := "’"
out_put := chr(0034)
clipwait, 2
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 2

in_put := "‘"
out_put := chr(0034)
clipwait, 2
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 2

in_put := "”"
out_put := chr(0034)
clipwait, 2
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 2

sendinput, ^v
sendinput, ^{Home}
return
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
User avatar
boiler
Posts: 16771
Joined: 21 Dec 2014, 02:44

Re: char() as hex values are not identified

22 Apr 2021, 00:02

One big problem with your attempts at using hex is the values are wrong. For some reason you have, for example, 0x100022 which is 1,048,610 in decimal when you meant it to be 34 decimal. I'm guessing you took 0x1 to be the prefix for a hex number, but it's not, it's 0x. The 1 doesn't belong there. It should be 0x22 (or even 0x00022 since leading zeros don't matter) to be 34 in decimal. The same for the leading 1 you have in all the others. The 1 is adding either 65,536 or 1,048,576 to what you intended the numbers to be, depending on whether you have 4 or 5 digits following it.

You can check whether the values result in what you expect like this:

Code: Select all

MsgBox, % Chr(0x1000B4) ; incorrect
MsgBox, % Chr(0x00B4) ; correct

By the way, where you have chr(0034), only chr(34) is necessary since the leading zeros don't do anything in decimal either.
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: char() as hex values are not identified

22 Apr 2021, 01:00

boiler wrote:
22 Apr 2021, 00:02
One big problem with your attempts at using hex is the values are wrong. For some reason you have, for example, 0x100022 which is 1,048,610 in decimal when you meant it to be 34 decimal. I'm guessing you took 0x1 to be the prefix for a hex number, but it's not, it's 0x. The 1 doesn't belong there. It should be 0x22 (or even 0x00022 since leading zeros don't matter) to be 34 in decimal. The same for the leading 1 you have in all the others. The 1 is adding either 65,536 or 1,048,576 to what you intended the numbers to be, depending on whether you have 4 or 5 digits following it.

You can check whether the values result in what you expect like this:

Code: Select all

MsgBox, % Chr(0x1000B4) ; incorrect
MsgBox, % Chr(0x00B4) ; correct

By the way, where you have chr(0034), only chr(34) is necessary since the leading zeros don't do anything in decimal either.
Thanks for the MsgBox code. Unless I misunderstood, "0x100022" was the recommended value when using the AHK Unicode version.

After @mikeyww's direction, I went over the results and rewrote the script completely. The Tesseract OCR software used by Wikipedia, negated my assumptions about converting commas and quotations to esoteric characters of the Unicode table. The following script produces 99% accuracy, page after page.

I had to reorder some of the "find & replace" code segments, and then added exception code segments to the end of the script. I am sure that the code can be refined, but at this point, my productivity is up and I am getting work done.

Thanks to all.

Code: Select all

; 2021-04-21 21:00
; D:\ahk\en-clean1.ahk
; english 1st text cleaning

; alt+.                        clean text
!.::
critical, on
clipboard := ""
autotrim, on
send, ^a^c
clipwait, 0

; surround mdash with spaces
in_put := "—"
out_put := A_Space . in_put . A_Space
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; replace two spaces with one
in_put := A_Space . A_Space
out_put := A_Space
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; quotations and accents

; literal = ’’
in_put := "’’"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = ’
in_put := "’"
out_put := chr(0039)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = " '"
in_put := A_Space . chr(0039)
out_put := A_Space . chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = ‘
in_put := "‘"
out_put := chr(0039)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = “
in_put := "“"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = ”
in_put := "”"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = '*
in_put := chr(0039) . "*"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = "*
in_put := chr(0034) . "*"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = **
in_put := "**"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = '?
in_put := "'?"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = "?
in_put := chr(0034) . "?"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = "" two double quotes side by side
in_put := chr(0034)chr(0034)
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = ,'
in_put := "," . "'"
out_put := "." . chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = ''
in_put := "''"
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = "'
in_put := chr(0034) . chr(0039)
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal = '"
in_put := chr(0039) . chr(0034)
out_put := chr(0034)
clipwait, 1
clipboard := strreplace(clipboard, in_put, out_put)
clipwait, 1

; literal cr + " + space
in_put := chr(0013) . chr(0010) . chr(0034) . chr(0032)
out_put := chr(0013) . chr(0010) . chr(0034)
clipwait, 1
clipboard := regexreplace(clipboard, in_put, out_put)
clipwait, 1


sendinput, ^v
sendinput, ^{Home}
return
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
User avatar
boiler
Posts: 16771
Joined: 21 Dec 2014, 02:44

Re: char() as hex values are not identified

22 Apr 2021, 01:11

ineuw wrote: Unless I misunderstood, "0x100022" was the recommended value when using the AHK Unicode version.
I don't know where you saw that, but the MsgBox proves that's not the case.
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: char() as hex values are not identified

22 Apr 2021, 01:22

boiler wrote:
22 Apr 2021, 01:11
ineuw wrote: Unless I misunderstood, "0x100022" was the recommended value when using the AHK Unicode version.
I don't know where you saw that, but the MsgBox proves that's not the case.
Please look under Chr() in Autokey manual version 1.1.33.02. Now, I admit that this may be a typo, but I had to try. Also please check the second script and ignore the first.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
User avatar
mikeyww
Posts: 26601
Joined: 09 Sep 2014, 18:38

Re: char() as hex values are not identified

22 Apr 2021, 05:16

ClipWait has no effect unless you clear the clipboard before setting it. Assigning the clipboard repeatedly in rapid succession can lead to failure (which is why the ClipWait command exists). Except the first and last assignments in your script, you do not need to use the clipboard at all. Due to the clipboard's delay, working with an intermediate variable will be faster.
User avatar
boiler
Posts: 16771
Joined: 21 Dec 2014, 02:44

Re: char() as hex values are not identified

22 Apr 2021, 06:32

ineuw wrote: Please look under Chr() in Autokey manual version 1.1.33.02. Now, I admit that this may be a typo, but I had to try. Also please check the second script and ignore the first.
It’s neither a typo nor says to put a 1 in front of the hex value. It says:
Chr() documentation wrote:If Unicode is supported, Number is a Unicode character code between 0 and 0x10FFFF (or 0xFFFF prior to [v1.1.21]); otherwise it is an ANSI character code between 0 and 255.
When it says "between 0 and 0x10FFFF," that means numbers anywhere from 0 to 0x10FFFF. The minimum is 0 and the maximum is 0x10FFFF, which is 1,114,111 in decimal. Numbers like 0x0022 (34 in decimal) are between 0 and 0x10FFFF. It doesn’t say to put a 1 in front of any number that doesn’t already have one.
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: char() as hex values are not identified

22 Apr 2021, 12:38

Now you know why I am not the keeper of the launch codes. Especially at night.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: char() as hex values are not identified

22 Apr 2021, 15:52

mikeyww wrote:
22 Apr 2021, 05:16
ClipWait has no effect unless you clear the clipboard before setting it. Assigning the clipboard repeatedly in rapid succession can lead to failure (which is why the ClipWait command exists). Except the first and last assignments in your script, you do not need to use the clipboard at all. Due to the clipboard's delay, working with an intermediate variable will be faster.
Thanks for this valuable reminder. Truly drifted away from programming. Must quit retirement.

The truth is that I use AHK constantly, but rarely need write more sophisticated scripts. My scripts are variations on the same theme 90 percent of the time. And the rest of the time, I end up here asking questions.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: No registered users and 131 guests