Page 1 of 1

[SOLVED] Capture StdOut in UTF-8 format

Posted: 09 Mar 2015, 14:54
by boiler
I'm using SKAN's code to capture StdOut to a variable:
http://www.autohotkey.com/board/topic/9 ... douttovar/
That thread has questions that haven't been answered in over a year, so it looks like a dead thread. I'm hoping someone on this forum can help.

My issue is that when I use it, the output contains gibberish for the UTF-8 characters. It's the same output you would get running it in a console window. I know the output contains correct UTF-8 characters when I direct it to a file instead, but I don't want to have to access the hard drive for performance reasons (it will be running hundreds of commands in a row). I'd like to capture it directly to a variable.

I tried making the command line chcp 65001 & followed by the usual command line so that it would change the console code page to UTF-8 before executing. It returns nothing when I do that, so it seems this function can't handle compound command lines, even though they work on the console (although in the console, it says not enough memory when I execute it in UTF-8 mode, and even that out of memory message doesn't show as an output of this function, so I think it can't handle putting commands together with &).

I tried changing the dll call for CreateProcess to its Unicode equivalent CreateProcessW, but that also returned a blank.

Does anyone know how to have StdOut produce UTF-8 output so it can be captured in a variable with the correct characters?

Btw, I also tried to convert the actual output to UTF-8 character by character, but the two examples of 2-byte UTF-8 characters I looked at were output as two nonsensical UTF-8 characters. Same format as UTF-8 in that the first byte's high order bits indicate how many bytes it is, and the the second one (and third one when applicable) has the high order bits set as they should be.

Re: Capture StdOut in UTF-8 format

Posted: 09 Mar 2015, 14:58
by Coco
Untested:

Code: Select all

output := StdOutToVar(cmd)
VarSetCapacity(buf, StrPut(output, "UTF-8"))
StrPut(output, &buf, "UTF-8")
MsgBox % StrGet(&buf, "UTF-8")

Re: Capture StdOut in UTF-8 format

Posted: 09 Mar 2015, 19:51
by lexikos
Those simplified StdOutToVar functions assume the output is CP850. You would need to replace "CP850" with "UTF-8", or add a parameter to the function.

Re: Capture StdOut in UTF-8 format

Posted: 09 Mar 2015, 20:44
by boiler
Thank you both for your replies. I love this forum.

Coco: I don't think that will work in this case because the output from StdOutToVar isn't just misinterpreted UTF-8 bytes, it is actually a different set of bytes (more and different bytes to represent the same character). What is a 2-byte UTF-character is instead replaced by two separate 2-byte characters, and in some cases a 2-byte character is instead replaced by a 2-byte character and a 3-byte character. I would try it anyway, but Lexikos' change makes the output from the function already correct.

Lexikos: That worked perfectly! This is going to improve my app so much. Thanks very much!