Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

FileAppend Unicode text


  • Please log in to reply
14 replies to this topic
Camarade_Tux
  • Members
  • 35 posts
  • Last active: Feb 11 2007 08:06 AM
  • Joined: 05 Jun 2006
Hi everyone. :)

A few weeks ago I bothered you with my variables which had a dot in their names. I finally solved it (ok, I rewrote my script from scratch in fact :lol: ). I think there was two problems : me calling a function with a paramater like foo.bar or %foo%.%bar% instead of foo "." bar and because I was storing the value for each key of my ini file in a variable named after this key (i.e. description.0.1.2.0.1=kiki)


For the same script, I need to append text in Unicode format. How can I do that ?
I saw Transform, Unicode but this doesn't seem to serve my purpose.

Thanks.




PS: my script which is working now, woot!
It probably has no interest for you but I wanted to post it. ;)
Btw, it lacks a UI. ;)
max=80
source:=A_ScriptDir "\install.ini"
out:=A_ScriptDir "\out.ini"

Loop, Read, %A_ScriptDir%\sectionlist.txt ;parses sections to sort
{
	section:=A_LoopReadLine
	Sort()
}

Sort(oldlevel="", newlevel="") { ;the main function, recursively called
	Global max, source, out, section
	Loop, %max% {
		i:=A_Index - 1 ;WIHU's first index is 0 whereas AHK's one is 1
		IniRead, iuv, %source%, %section%, description%oldlevel%.%i%, %A_Space% ;retrieves sections description
		if (iuv="") ;avoid adding non-existant indexes
			continue
		if (Names) ;appends a semi-column and the section name (iuv) with its old index (@i) to the list, except if ...
			Names:=Names ";" iuv "@" i
		else ;... except if the list is empty (would bug otherwise)
			Names:=iuv "@" i
	}
	Sort, Names, D`;
	StringSplit, Names, Names, `;
	Loop, %Names0% {
		i:=A_Index - 1 ;same as before : WIHU's first index is 0 whereas AHK's one is 1
		Stringsplit, oldrank, Names%A_Index%, `@ ;retrieves old index
		Write(oldlevel "." oldrank2, newlevel "." i, "description") ;calls the write function for each 
		Write(oldlevel "." oldrank2, newlevel "." i, "command")
		Write(oldlevel "." oldrank2, newlevel "." i, "selected")
		Write(oldlevel "." oldrank2, newlevel "." i, "hidden")
		Write(oldlevel "." oldrank2, newlevel "." i, "collapsed")
		Write(oldlevel "." oldrank2, newlevel "." i, "locked")
		Write(oldlevel "." oldrank2, newlevel "." i, "disabled")
		Write(oldlevel "." oldrank2, newlevel "." i, "group")
		Write(oldlevel "." oldrank2, newlevel "." i, "flags")
		Write(oldlevel "." oldrank2, newlevel "." i, "workdir")
		Write(oldlevel "." oldrank2, newlevel "." i, "helptext")
		Write(oldlevel "." oldrank2, newlevel "." i, "ext_creator_switchtype")
		Write(oldlevel "." oldrank2, newlevel "." i, "ext_creator_switchtype")
		IniRead, iuv, %source%, %section%, description%oldlevel%.%oldrank2%.0, %A_Space% ;is there a deeper level to sort ?
		if (iuv!="")
			Sort(oldlevel "." oldrank2, newlevel "." i) ;recursive call because when there is a deeper level
	}
}

FileRead, sorted, %out%
StringReplace, sorted, sorted, description, `r`ndescription, All
StringReplace, sorted, sorted, ]`r`n, ], All
FileAppend, %sorted%, %A_ScriptDir%\sorted.ini
ExitApp

Write(old, new, var) { ;the write function
	global max, source, out, section
	IniRead, iuv, %source%, %section%, %var%%old%, %A_Space% ;retrieves the old value
	if (iuv!="")
		IniWrite, %iuv%, %out%, %section%, %var%%new% ;writes the new value
}

PS2: sometimes a sheet of paper is better than a computer to code something.

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
#Include BinReadWrite.ahk	; http://www.autohotkey.com/forum/viewtopic.php?t=7549



filename = C:\tmp\UnicodeText.txt

textToConvert = This îs à simplé tèxt to çonvêrt tô Ünïcödë



; Convert the Ansi text to Unicode



textLength := StrLen(textToConvert)

uniLength := textLength * 2

VarSetCapacity(uniText, uniLength, 0)



DllCall("SetLastError", "UInt", 0)

r := DllCall("MultiByteToWideChar"

      , "UInt", 0            ; CodePage: CP_ACP=0 (current Ansi), CP_UTF7=65000, CP_UTF8=65001

      , "UInt", 0            ; dwFlags

      , "Str", textToConvert ; LPSTR lpMultiByteStr

      , "Int", textLength    ; cbMultiByte: -1=null terminated

      , "UInt", &uniText     ; LPCWSTR lpWideCharStr

      , "Int", uniLength)    ; cchWideChar: 0 to get required size



;~ MsgBox % DumpDWORDs(uniText, textLength * 2, true)



; Write it as binary blob to a file



fh := OpenFileForWrite(filename)

If (ErrorLevel != 0)

{

	MsgBox 16, Test, Can't open file '%filename%': %ErrorLevel%

	Exit

}

WriteInFile(fh, uniText, uniLength)

If (ErrorLevel != 0)

{

	MsgBox 16, Test, Can't write in file '%filename%': %ErrorLevel%

	Exit

}

CloseFile(fh)



; Let's re-read this file!



FileGetSize fileSize, %filename%

FileRead fileBuffer, %filename%



textSize := fileSize / 2

VarSetCapacity(ansiText, textSize, 0)



DllCall("SetLastError", "UInt", 0)

r := DllCall("WideCharToMultiByte"

      , "UInt", 0           ; CodePage: CP_ACP=0 (current Ansi), CP_UTF7=65000, CP_UTF8=65001

      , "UInt", 0           ; dwFlags

      , "Str", fileBuffer   ; LPCWSTR lpWideCharStr

      , "Int", fileSize / 2 ; cchWideChar: size in WCHAR values, -1=null terminated

      , "Str", ansiText     ; LPSTR lpMultiByteStr

      , "Int", textSize     ; cbMultiByte: 0 to get required size

      , "UInt", 0           ; LPCSTR lpDefaultChar

      , "UInt", 0)          ; LPBOOL lpUsedDefaultChar



MsgBox %ansiText%`n---`n=> %ErrorLevel% / %A_LastError% / %r%


Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

bugmenot
  • Members
  • 13 posts
  • Last active: Dec 10 2008 01:47 AM
  • Joined: 03 Jul 2006
Hi PhiLho thanks for all your efforts, but dumpDWORD is missing here. first I copied then I downloaded all dependencies but this does not work out. Can you tell me where I can find the include with dumpDWORD?

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
You may have not noticed it, but it was in a commented out line. ;-)
It is not essential, just a debug feature.
I gave the code somewhere in this forum, but being lazy, I will just point to the file I uploaded: it is in DllCallStruct.ahk. This file includes another: BinaryEncodingDecoding.ahk.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

Camarade_Tux
  • Members
  • 35 posts
  • Last active: Feb 11 2007 08:06 AM
  • Joined: 05 Jun 2006
Thanks for your code.

First, a comment : a Unicode file should have a BOM; http://en.wikipedia....Byte_Order_Mark
(Notepadpp doesn't like Unicode without it btw)

Then, it's not totally working. :(
Here is a sample file which contains accents at line 13 : http://rapidshare.de... ... d.ini.html (sorry for the unfriendly host, I wanted to keep the file properties)
It was created with IniWrite and I added/removed some linefeeds with FileRead -> StringReplace -> FileAppend.

I read the file with ReadFromFile() or FileRead and I get "Ãç" (approx.) characters instead of my accents. :cry:

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

First, a comment : a Unicode file should have a BOM; http://en.wikipedia....Byte_Order_Mark
(Notepadpp doesn't like Unicode without it btw)

SHOULD, it is not mandatory by the standard. But as you point out (and Wikipedia with you), it seems that some Windows apps need it. Althought I made a test with Notepad (WinXP Pro SP2), and it opened the file I created without Bom.
Perhaps now it uses the IsTextUnicode function to detect it. A program like SciTE doesn't try to guess: if it doesn't see the Bom, it doesn't know it is Unicode.

Then, it's not totally working. :(

Well, allocate two more bytes, put the Bom at the start, give the buffer address + 2 to the DllCall.

Here is a sample file which contains accents at line 13 : http://rapidshare.de... ... d.ini.html (sorry for the unfriendly host, I wanted to keep the file properties)

Unfriendly is the word: several clicks to get it, two popunder even with Firefox. I really wanted to see the file, otherwise I would have dropped it! If you plan to share more files, I advise you to use AutoHotkey.net (see sticky in the Scripts section).

I read the file with ReadFromFile() or FileRead and I get "Ãç" (approx.) characters instead of my accents. :cry:

Well, your file isn't in Windows' Unicode (UCS-2, aka. UTF-16) but in UTF-8. These à characters are typical...
I don't understand how you created this file, AFAIK, UTF-8 isn't a native Windows format, althought it is supported by the Multibyte/WideChar conversion functions.

I uploaded an updated version of ReadWriteUnicodeText.ahk which adds (and skip) a Bom. Note that I had to add `r to end of lines in a continuation section (I could have used Join `r`n) to get correct EOLs (understood by Notepad...).

Last note: you don't need Unicode to write INI files with French accents in them, AFAIK. These accents are in the Ansi default codepage of Windows.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
If the text contains only ANSI characters (codepage 0), is not it simpler to insert/remove a \0 in front of each character code?

Camarade_Tux
  • Members
  • 35 posts
  • Last active: Feb 11 2007 08:06 AM
  • Joined: 05 Jun 2006

First, a comment : a Unicode file should have a BOM; http://en.wikipedia....Byte_Order_Mark
(Notepadpp doesn't like Unicode without it btw)

SHOULD, it is not mandatory by the standard. But as you point out (and Wikipedia with you), it seems that some Windows apps need it. Althought I made a test with Notepad (WinXP Pro SP2), and it opened the file I created without Bom.
Perhaps now it uses the IsTextUnicode function to detect it. A program like SciTE doesn't try to guess: if it doesn't see the Bom, it doesn't know it is Unicode.

This is probably because windows's notepad and notepad2 can only run under windows, which always means little-endian, whereas SciTe/ScinTilla can run under *nux.
(note, I never played with a big-endian box)

Here is a sample file which contains accents at line 13 : http://rapidshare.de... ... d.ini.html (sorry for the unfriendly host, I wanted to keep the file properties)

Unfriendly is the word: several clicks to get it, two popunder even with Firefox. I really wanted to see the file, otherwise I would have dropped it! If you plan to share more files, I advise you to use AutoHotkey.net (see sticky in the Scripts section).

It seems they added many ads recently.
Also, since more and more ads rely on an external javascript it is possible my HOSTS file blocked some.
Blocking content.yieldmanager.com and ad.firstadsolution.com should prevent at least two popunders.

I read the file with ReadFromFile() or FileRead and I get "Ãç" (approx.) characters instead of my accents. :cry:

Well, your file isn't in Windows' Unicode (UCS-2, aka. UTF-16) but in UTF-8. These à characters are typical...
I don't understand how you created this file, AFAIK, UTF-8 isn't a native Windows format, althought it is supported by the Multibyte/WideChar conversion functions.

I uploaded my script so you can see it because I really can't reproduce that :?
https://ahknet.autoh... ... Sorter.txt

Last note: you don't need Unicode to write INI files with French accents in them, AFAIK. These accents are in the Ansi default codepage of Windows.

My script is a sorter for WIHU : http://www.kalytta.com/wihu.php
(typically the config files for this apps are a few thousands of lines long :shock: , and mine is probably going to be much longer since I can sort it easily now.)
WIHU needs the config file to be in Unicode otherwise it can't display accents. :(

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
PhiLho's dll calls can be very useful for converting between UTF-8 and Unicode. Copy some text to the Clipboard in a Unicode application (IE, Word, Notepad, WordPad...) and the Alt-Z hotkey shows the conversion results. (We can get UTF-8 to an AHK script through "Transform x, Unicode" from the clipboard, and with these dll's we can process it.) Again, converting between ANSI and Unicode is just a matter of inserting/removing \0 bytes.
!z::

   Transform text, Unicode ; text <- UTF-8 Transformed Clipboard

   MsgBox UTF-8:`n%text%



   ; Convert UTF-8 text to Unicode



   Len := StrLen(text)

   VarSetCapacity(UText, 2*Len+2, 0) ; worst case size 8 -> 16-bit char



   ULen:= -2+2*DllCall("MultiByteToWideChar"

      , UInt, 65001        ; CodePage: CP_ACP=0 (current Ansi), CP_UTF7=65000, CP_UTF8=65001

      , UInt, 0            ; dwFlags

      , Str, text          ; LPSTR lpMultiByteStr

      , Int, -1            ; cbMultiByte: Len or -1 (= null terminated)

      , UInt, &UText       ; LPCWSTR lpWideCharStr

      , Int, 2*Len)        ; cchWideChar: ULEN or 0 (= get required size)



   MsgBox % "Little Endian Unicode:`n"Bin2Hex(UText,ULen) "`n" ULen " bytes"



   ; Convert Unicode text to UTF-8



   VarSetCapacity(AText, Len, 0)



   r := DllCall("WideCharToMultiByte"

      , UInt, 65001       ; CodePage: CP_ACP=0 (current Ansi), CP_UTF7=65000, CP_UTF8=65001

      , UInt, 0           ; dwFlags

      , Str, UText        ; LPCWSTR lpWideCharStr

      , Int,  -1          ; cchWideChar: size in WCHAR values: Len or -1 (= null terminated)

      , Str, AText        ; LPSTR lpMultiByteStr

      , Int, Len          ; cbMultiByte: Len or 0 (= get required size)

      , UInt, 0           ; LPCSTR lpDefaultChar

      , UInt, 0)          ; LPBOOL lpUsedDefaultChar



   MsgBox Convert back to UTF-8:`n%AText%

Return



Bin2Hex(ByRef b, n=0)            ; n bytes binary data -> stream of 2-digit hex

{                                ; n = 0: all (SetCapacity can be larger than used!)

   format = %A_FormatInteger%    ; save original integer format

   SetFormat Integer, Hex        ; for converting bytes to hex



   m := VarSetCapacity(b)

   If (n < 1 or n > m)           ; invalid length -> all allocated

       n := m

   Loop %n%

      h := h 256+*(&b+A_Index-1) ; concatenate  0x1xx

   StringReplace h, h, 0x1,,All  ; remove every 0x1



   SetFormat Integer, %format%   ; restore original format

   Return h

}


PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

I uploaded my script so you can see it because I really can't reproduce that :?
https://ahknet.autoh... ... Sorter.txt

You can keep the .ahk extension, you know?
I am sorry, it is a bit long, I won't analyze it...

My script is a sorter for WIHU : http://www.kalytta.com/wihu.php
(typically the config files for this apps are a few thousands of lines long :shock: , and mine is probably going to be much longer since I can sort it easily now.)
WIHU needs the config file to be in Unicode otherwise it can't display accents. :(

Are you sure that what they call "Unicode" is UTF-16 (Windows' Unicode)? Unicode is a standard defining how characters are mapped to numerical values. Then, there is the encoding of the values, which can be always on 16 bits (UTF-16), always on 32 bits (UTF-32, uncommon), or in variable width (UTF-7 or UTF-8). If the file I saw is the result of processing a file generated by WIHU, then it is likely that it was generated as UTF-8.
Note that UTF-8 is AHK friendly, because these strings are plain Ascii for regular US chars, and use 8bit chars with never a binary zero nor control chars for higher codes.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

Camarade_Tux
  • Members
  • 35 posts
  • Last active: Feb 11 2007 08:06 AM
  • Joined: 05 Jun 2006
*banging his head on the wall*


It was probably something with the app I used first to create my install.ini. It relies on NINI parser and it seems somehow it wrote è, à, é, ç on two bytes, while being in an ANSI file. :?
So on my computer the problem seems to have been solved.
However, there are some chinese users of WIHU I'd like to share my work with.
Also, WIHU is for Windows and only for Windows, afaik it uses WinAPI to read ini files so I believe its Unicode is UTF-16.


I tried new ReadWriteUnicode script and it is working for the provided string but I can't get it to work with FileRead, textToConvert, D:\Scratch\Download\AHK\install.ini.
textToConvert is not empty but nothing is processed and the script exits in no time.

I also used Laszlo's code and I'm unable to write to the file. In fact, the file is opened for writing (windows prevents me from erasing it until I stop the script) but nothing is written in it. Is there something to change when calling a function with a ByRef parameter ?

Thanks.

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

I tried new ReadWriteUnicode script and it is working for the provided string but I can't get it to work with FileRead, textToConvert, D:\Scratch\Download\AHK\install.ini.
textToConvert is not empty but nothing is processed and the script exits in no time.

I don't understand. If the file is WinUnicode, FileRead can read it (I use it), but you have to convert it using the given API function, otherwise it is just a blob of binary data that AutoHotkey cannot use.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

David Andersen
  • Members
  • 140 posts
  • Last active: Jun 28 2011 04:54 PM
  • Joined: 15 Jul 2005
I need to run a program and pass a string to it with UTF-8 encoding. I have been trying to use the script created by PhiLho but I can't make it work. I created a small test program where I use the DOS command Echo to see whether I can make it work with that one first:

#SingleInstance Force
#NoEnv	; Recommended for performance and compatibility with future AutoHotkey releases.

textToConvert = øæå « – â ¬ Å“ Å’ — »

textLength := StrLen(textToConvert)
VarSetCapacity(uniText, textLength * 2 + 1, 0) ; Worse case (all Ascii)
r := DllCall("MultiByteToWideChar"
		, "UInt", 65001        ; CodePage: CP_ACP=0 (current Ansi), CP_UTF7=65000, CP_UTF8=65001
		, "UInt", 0            ; dwFlags
		, "Str", textToConvert ; LPSTR lpMultiByteStr
		, "Int", textLength    ; cbMultiByte: -1=null terminated
		, "UInt", &uniText     ; LPCWSTR lpWideCharStr
		, "Int", textLength)   ; cchWideChar: 0 to get required size

Run, %comspec% /c echo :%uniText% > egg.txt, , min

Which writes ":" to the file.

The .exe file that I want to pass the parameter to cannot be changed and because of performance reasons I don't want to write the parameter to a file first and then pass it on to the exe file.

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Well, it seems you are converting your UTF-8 string to UTF-16 while you write you need to pass it a string in UTF-8...
Why do you make this conversion?
And your code cannot work, because you use the result as it is a plain string. But UTF-16 strings are likely to have zeroes inside, so AHK will fail to transmit them correctly in your example. You have to process them entirely with DllCalls.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

David Andersen
  • Members
  • 140 posts
  • Last active: Jun 28 2011 04:54 PM
  • Joined: 15 Jul 2005
Thanks PhiLho,

I had the feeling that I was lost, that's why I love using this forum. I will try searching for dll's to call. Although, I really miss a place where I can learn step by step, how to find relevant dll's, understand how they function, and understand how I can write the DllCall. I think it is due to my lack of programming experience in C++ that it takes me a long time to get comfortable with DllCalls.