AutoHotkey Community

It is currently May 26th, 2012, 10:59 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Unicode
PostPosted: August 28th, 2008, 10:07 pm 
Offline

Joined: April 6th, 2008, 5:37 am
Posts: 13
The newest stuff about AutoHotkey and Unicode in the forum seems to be a couple years old... nothing recent in the changelog about Unicode...

In the last couple years more and more programs have come to use Unicode effectively; I finally put my emacs into "save in UTF-8 by default" mode, and most everything is happy...

Except, AutoHotkey... scripts containings UTF-8 characters are interpreted by AutoHotkey as sequences of 8-bit characters, and hotstrings containing UTF-8 characters insert those sequences of 8-bit characters instead.

What is the minimal replacement for hotstring such as

::bt@::Buy this ąţ 75¢

(Note the Unicode characters -- the sentence is nonsensical with those characters, but they are not in Latin-1, so the sentence does make a good test case.)

Or is it best to find a different program, leaving AutoHotkey behind? Or is there even a different program that can handle this sort of thing?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 28th, 2008, 11:44 pm 
Offline

Joined: April 18th, 2008, 7:57 am
Posts: 1390
Location: The Interwebs
SendU() function allows you to send unicode characters through AutoHotkey.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 12:23 am 
Offline

Joined: April 6th, 2008, 5:37 am
Posts: 13
OK, I guess you meant SendInputU instead of SendU...

However, this doesn't seem like a drop-in replacement for

::bt@::Buy this ąţ 75¢

Now I'm willing to do a bit of work to get this right, but

-> I see two different implementations, one claims to have improved performance. When I count the DllCall calls, I only get a 6 to 4 reduction, not the claimed 20 to 4 reduction, which convinces me I really don't understand one or the other of the implementations (or probably both).

-> I see a comment about making backspace work universally, but it is not clear where to put the code snippet.

-> I see nothing about how to integrate this into hotstrings... I guess one would have to use the command form of hotstrings...

-> The interface seems to require 4 hex digits per character to be sent, which is mostly unreadable. Yes, comments are available, but it is painstaking to create such things.

So it seems that AHK is still not quite useful for Unicode stuff... can be done, but requires lots of code, and is painstaking to use?

Has someone already wrapped this into something friendlier?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 1:33 am 
Offline

Joined: April 18th, 2008, 7:57 am
Posts: 1390
Location: The Interwebs
guruglenn wrote:
-> I see two different implementations, one claims to have improved performance. When I count the DllCall calls, I only get a 6 to 4 reduction, not the claimed 20 to 4 reduction, which convinces me I really don't understand one or the other of the implementations (or probably both).

I'll break it down for you.
Code:
EncodeInteger( p_value, p_size, p_address, p_offset )
{
   loop, %p_size% ;one DllCall per p_size
      DllCall( "RtlFillMemory"
         , "uint", p_address+p_offset+A_Index-1
         , "uint", 1
         , "uchar", ( p_value >> ( 8*( A_Index-1 ) ) ) & 0xFF )
}

SendInputU( p_text )
{
   StringLen, len, p_text

   INPUT_size = 28
   
   event_count := ( len//4 )*2
   VarSetCapacity( events, INPUT_size*event_count, 0 )

   loop, % event_count//2
   {
      StringMid, code, p_text, ( A_Index-1 )*4+1, 4
     
      base := ( ( A_Index-1 )*2 )*INPUT_size+4
         EncodeInteger( 1, 4, &events, base-4 ) ;four dllcalls per char
         EncodeInteger( "0x" code, 2, &events, base+2 ) ;two dllcalls per char
         EncodeInteger( 4, 4, &events, base+4 ) ;four dllcalls per char  ; KEYEVENTF_UNICODE

      base += INPUT_size
         EncodeInteger( 1, 4, &events, base-4 ) ;four more per char
         EncodeInteger( "0x" code, 2, &events, base+2 ) ;2 more per char
         EncodeInteger( 2|4, 4, &events, base+4 ) ;4 more per char  ; KEYEVENTF_KEYUP|KEYEVENTF_UNICODE
   }
   
   result := DllCall( "SendInput", "uint", event_count, "uint", &events, "int", INPUT_size ) ;one last one at the end
   if ( ErrorLevel or result < event_count )
   {
      MsgBox, [SendInput] failed: EL = %ErrorLevel% ~ %result% of %event_count%
      return, false
   }
   
   return, true
}

Total of 20 per char + 1 at the end
Code:
SendInputU( p_text ) ; 4 DllCalls/char + 1, reduced from 20/char + 1
{
   event_count := ( StrLen(p_text)//4 )*2
   VarSetCapacity( events, 28*event_count, 0 )
   base = 0
   Loop % event_count//2
   {
      StringMid code, p_text, 4*A_Index-3, 4
      code = 0x4%code%

      DllCall("RtlFillMemory", "uint", &events + base, "uint",1, "uint", 1) ; one per char!
      DllCall("ntoskrnl.exe\RtlFillMemoryUlong", "uint",&events+base+6, "uint",4, "uint",code) ;one more per char
      base += 28

      DllCall("RtlFillMemory", "uint", &events + base, "uint",1, "uint", 1) ;another per char
      DllCall("ntoskrnl.exe\RtlFillMemoryUlong", "uint",&events+base+6, "uint",4, "uint",code|(2<<16)) ;one last one per char
      base += 28
   }
   result := DllCall( "SendInput", "uint", event_count, "uint", &events, "int", 28 ) ;and one at the end
   if ( ErrorLevel or result < event_count )
      MsgBox SendInput failed`nErrorLevel = %ErrorLevel%`n%result% events of %event_count%
}

Total of 4 per char + 1 at the end
Hope that makes it a bit clearer.

Quote:
-> I see a comment about making backspace work universally, but it is not clear where to put the code snippet.

Not 100% sure, try playing around with it in different places.

Quote:
-> I see nothing about how to integrate this into hotstrings... I guess one would have to use the command form of hotstrings...

Yeah, something like:
Code:
::bt@::
SendInput, Buy this%A_Space%
SendInputU("01050163") ;hex code for ąţ
SendInput, %A_Space%75¢
Return

Not tested, as I use SendU() rather than SendInputU(). But I'm pretty sure it will work. The hex codes are certainly correct.
Quote:
-> The interface seems to require 4 hex digits per character to be sent, which is mostly unreadable. Yes, comments are available, but it is painstaking to create such things.
<snip>
Has someone already wrapped this into something friendlier?

This is true, which is why unicode is a planned implementation for the future. And I don't see any way to wrap this in a more friendly way, because of AHK's lack of basic support for unicode.

Quote:
So it seems that AHK is still not quite useful for Unicode stuff... can be done, but requires lots of code, and is painstaking to use?

I suppose. But it still can be very useful; I use it a lot when I have to type up things for my Mandarin class.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 1:43 am 
Offline

Joined: April 6th, 2008, 5:37 am
Posts: 13
I've figured out the count of 20 DllCall calls now.

Seems the event structure is 28 bytes, but only 3 fields of 10 bytes are used. But the optimized form only writes 5 bytes, because the other 5 are always zero for this use case.


So it is interesting that Skype doesn't accept the strings generated this way, as someone in the other thread noted. Skype is a strange software beast, but it seems to accept Unicode paste-ins from the clipboard... so why not from SendInputU ?


The hardest part about making this useful looks like it is going to be the code to convert a UTF-16 string into a hex string for the SendInputU interface.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 1:52 am 
Offline

Joined: April 18th, 2008, 7:57 am
Posts: 1390
Location: The Interwebs
Here's the converter I use.

Also, there are other modes (like using the clipboard) that you can use, shown in the Extended SendU function thread.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 1:57 am 
Offline

Joined: April 6th, 2008, 5:37 am
Posts: 13
I was in the midst of replying again, when you did, so my last reply was before reading your last reply...

Krogdor wrote:
Quote:
-> I see a comment about making backspace work universally, but it is not clear where to put the code snippet.

Not 100% sure, try playing around with it in different places.

[/quote]

Hmm. Maybe one has to notice values of 8 in the string and use the different code conditionally. I don't think I'll need embedded backspaces, though. And if I do, why not use SendInput instead? I suppose because of the convenience of bundling everything into a single string.

Krogdor wrote:
Quote:
-> I see nothing about how to integrate this into hotstrings... I guess one would have to use the command form of hotstrings...

Yeah, something like:
Code:
::bt@::
SendInput, Buy this%A_Space%
SendInputU("01050163") ;hex code for ąţ
SendInput, %A_Space%75¢
Return

Not tested, as I use SendU() rather than SendInputU(). But I'm pretty sure it will work. The hex codes are certainly correct.


I've tested it, yes it works. Thanks for confirming my suspicions.

So what is the difference between SendU and SendInputU, and I don't find SendU anywhere in the link you gave...

Quote:
And I don't see any way to wrap this in a more friendly way, because of AHK's lack of basic support for unicode.


Interestingly, this code
Code:
Ansi2Unicode(ByRef sString, ByRef wString, CP = 0)
{
     nSize := DllCall("MultiByteToWideChar"
      , "Uint", CP
      , "Uint", 0
      , "Uint", &sString
      , "int",  -1
      , "Uint", 0
      , "int",  0)

   VarSetCapacity(wString, nSize * 2)

   DllCall("MultiByteToWideChar"
      , "Uint", CP
      , "Uint", 0
      , "Uint", &sString
      , "int",  -1
      , "Uint", &wString
      , "int",  nSize)
}

found elsewhere on the forum, together with code like
Code:
utf8String := "Buy this ąţ 75¢"
Ansi2Unicode(utf8String, wString, 65001)

and the use of UTF-8 encoding on the .ahk files (without BOM to avoid confusing AHK) will generate the UTF-16 at the expense of 2 DllCall calls per string, but giving the immense convenience not having to code hex UTF-16 codes.

Now how to convert the UTF-16 string to the hex string desired by SendInputU. Or is SendU friendlier for this?


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 3:12 am 
Offline

Joined: April 18th, 2008, 7:57 am
Posts: 1390
Location: The Interwebs
SendU() can be found in the 5th post of the original link I gave you, and was posted by Laszlo. It only sends one character at a time, but this works fine for my purposes. It, as well as other modes, can also be found at the Extended SendU() link I posted.



Also, using the Ansi2Unicode() function you posted, I get that wString contains only the letter B. Which seems strange.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 3:20 am 
Offline

Joined: April 6th, 2008, 5:37 am
Posts: 13
So Ansi2Unicode works, I guess, but AHK can't process the string, because it contains zero bytes. Grumble.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 5:43 am 
Offline

Joined: April 6th, 2008, 5:37 am
Posts: 13
So I've been reading about all the MCode, and the followon Hex2Bin stuff that Laszlo did, and maybe it can process the Unicode...

In fact, it can, but SendInput assumes big-endian UTF-16BE and Hex2Bin does UTF-16LE! So some strange characters come out the other end!


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 6:36 am 
Offline

Joined: April 6th, 2008, 5:37 am
Posts: 13
OK, here is the final solution, cobbled together from various other pieces, and even a small bit of my own coding. Many thanks to Laszlo for doing some fancy ASM coding, and Krogdor for talking me through it and pointing out some of the pieces.

The problem:

Given a file in UTF-8 (no BOM marks, like produced by emacs when told to save in UTF-8), provide a simple-to-use, functional replacement for hotstrings like the following:

::bt@::Buy this ąţ 75¢

The above doesn't work, because AHK turns all the UTF-8 characters into multiple 8-bit characters instead.
This solution provides the following syntax:

:*:bt@::
SendUTF8( "Buy this ąţ 75¢" )
Return

which works by reinterpreting AHK's interpretation of the string as being 8-bit bytes, redecoding it as UTF-8 instead (using a variation of Ansi2Unicode found elsewhere on this forum), then converting that to hex using Bin2Hex (found elsewhere on this forum), and then using SendInputUo (a variation of SendInputU found elsewhere on this forum).

Ansi2Unicode had to be modified to return the length, in characters, of the input string. Bin2Hex needed no modification. SendInputU took strings of hex UTF-16BE, but the preceding functions here produce strings of hex UTF-16LE, so SendInputUo was changed to consume UTF-16LE instead.

Now that I see more about how some of this stuff works, it might be possible to bypass the hex conversion, and have SendInputUo consume binary directly, via NumGet.

Code:
:*:bt@::
SendUTF8( "Buy this ąţ 75¢" )
Return

SendUTF8( utf8String ) ; 4 DllCalls/char + 4
{
  sz := Ansi2Unicode( utf8String, wString, 65001 )
  hexString := bin2Hex( &wString, sz*2-2 )
  ; debug  sz2 := StrLen( hexString )
  ; debug  SendInput, %sz% - %sz2% - %hexString% -
  SendInputUo( hexString )
}

SendInputUo( p_text ) ; 4 DllCalls/char + 1, reduced from 20/char + 1
{
   event_count := ( StrLen(p_text)//4 )*2
   VarSetCapacity( events, 28*event_count, 0 )
   base = 0
   Loop % event_count//2
   {
      StringMid code1, p_text, 4*A_Index-3, 2
      StringMid code2, p_text, 4*A_Index-1, 2
      code = 0x4%code2%%code1%

      DllCall("RtlFillMemory", "uint", &events + base, "uint",1, "uint", 1)
      DllCall("ntoskrnl.exe\RtlFillMemoryUlong", "uint",&events+base+6, "uint",4, "uint",code)
      base += 28

      DllCall("RtlFillMemory", "uint", &events + base, "uint",1, "uint", 1)
      DllCall("ntoskrnl.exe\RtlFillMemoryUlong", "uint",&events+base+6, "uint",4, "uint",code|(2<<16))
      base += 28
   }
   result := DllCall( "SendInput", "uint", event_count, "uint", &events, "int", 28 )
   if ( ErrorLevel or result < event_count )
      MsgBox SendInput failed`nErrorLevel = %ErrorLevel%`n%result% events of %event_count%
}

Ansi2Unicode(ByRef sString, ByRef wString, CP = 0)
{
     nSize := DllCall("MultiByteToWideChar"
      , "Uint", CP
      , "Uint", 0
      , "Uint", &sString
      , "int",  -1
      , "Uint", 0
      , "int",  0)

   VarSetCapacity(wString, nSize * 2)

   DllCall("MultiByteToWideChar"
      , "Uint", CP
      , "Uint", 0
      , "Uint", &sString
      , "int",  -1
      , "Uint", &wString
      , "int",  nSize)

   return nSize
}

Bin2Hex(addr,len) { ; Bin2Hex(&x,4)
   Static fun
   If (fun = "")
      Hex2Bin(fun,"8B4C2404578B7C241085FF7E2F568B7424108A06C0E8042C0A8AD0C0EA05"
      . "2AC2044188018A06240F2C0A8AD0C0EA052AC2410441468801414F75D75EC601005FC3")
   VarSetCapacity(hex,2*len+1)
   dllcall(&fun, "uint",&hex, "uint",addr, "uint",len, "cdecl")
   VarSetCapacity(hex,-1) ; update StrLen
   Return hex
}

Hex2Bin(ByRef bin, hex) { ; Hex2Bin(fun,"8B4C24") = MCode(fun,"8B4C24")
   Static fun
   If (fun = "") {
      h:="568b74240c8a164684d2743b578b7c240c538ac2c0e806b109f6e98ac802cac0e104880f8"
       . "a164684d2741a8ac2c0e806b309f6eb80e20f02c20ac188078a16474684d275cd5b5f5ec3"
      VarSetCapacity(fun,StrLen(h)//2)
      Loop % StrLen(h)//2
         NumPut("0x" . SubStr(h,2*A_Index-1,2), fun, A_Index-1, "Char")
   }
   VarSetCapacity(bin,StrLen(hex)//2)
   dllcall(&fun, "uint",&bin, "Str",hex, "cdecl")
}


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 7:04 am 
Offline

Joined: April 6th, 2008, 5:37 am
Posts: 13
OK, the shorter, non-hex solution is here. SendInputUo has been further modified, and the Bin2Hex and Hex2Bin stuff is no longer needed. New dependency on NumGet in AHK 1.0.47+ instead. If you must use an older version, the previous solution may work.

Code:
:*:bt@::
SendUTF8( "Buy this ąţ 75¢" )
Return

SendUTF8( utf8String ) ; 4 DllCalls/char + 3
{
  sz := Ansi2Unicode( utf8String, wString, 65001 ) - 1
  SendInputUo( wString, sz )
}

SendInputUo( byRef p_text, numchars ) ; 4 DllCalls/char + 1, reduced from 20/char + 1
{
   event_count := numchars * 2
   VarSetCapacity( events, 28*event_count, 0 )
   base = 0
   Loop %numchars%
   {
      code := NumGet( p_text, 2*A_Index-2, "UShort")

      DllCall("RtlFillMemory", "uint", &events + base, "uint",1, "uint", 1)
      DllCall("ntoskrnl.exe\RtlFillMemoryUlong", "uint",&events+base+6, "uint",4, "uint",code + 0x40000 )
      base += 28

      DllCall("RtlFillMemory", "uint", &events + base, "uint",1, "uint", 1)
      DllCall("ntoskrnl.exe\RtlFillMemoryUlong", "uint",&events+base+6, "uint",4, "uint",code + 0x60000 )
      base += 28
   }
   result := DllCall( "SendInput", "uint", event_count, "uint", &events, "int", 28 )
   if ( ErrorLevel or result < event_count )
      MsgBox SendInput failed`nErrorLevel = %ErrorLevel%`n%result% events of %event_count%
}

Ansi2Unicode(ByRef sString, ByRef wString, CP = 0)
{
     nSize := DllCall("MultiByteToWideChar"
      , "Uint", CP
      , "Uint", 0
      , "Uint", &sString
      , "int",  -1
      , "Uint", 0
      , "int",  0)

   VarSetCapacity(wString, nSize * 2)

   DllCall("MultiByteToWideChar"
      , "Uint", CP
      , "Uint", 0
      , "Uint", &sString
      , "int",  -1
      , "Uint", &wString
      , "int",  nSize)

   return nSize
}


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: August 29th, 2008, 7:50 am 
Offline

Joined: April 18th, 2008, 7:57 am
Posts: 1390
Location: The Interwebs
Sweet! That's an awesome solution. Thanks for posting it (:


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: February 18th, 2009, 6:07 pm 
This is a great solution for 2byte unicode charaters. Thank a ton!


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: January 17th, 2011, 5:27 pm 
Offline

Joined: July 20th, 2009, 6:01 am
Posts: 165
Location: Amsterdam
That you so much, Glenn!! This makes my day.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC [ DST ]


Who is online

Users browsing this forum: [VxE], Klark92, Yahoo [Bot] and 62 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group