AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Support for Unicode file and folder names

 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Wish List
View previous topic :: View next topic  

If you use Unicode characters in filenames, would AutoHotkey support for such files be of use to you?
Yes
51%
 51%  [ 15 ]
No
10%
 10%  [ 3 ]
I don't use Unicode characters in file or folder names.
34%
 34%  [ 10 ]
Other
3%
 3%  [ 1 ]
Total Votes : 29

Author Message
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10467

PostPosted: Wed Dec 08, 2004 11:24 pm    Post subject: Support for Unicode file and folder names Reply with quote

Tekl wrote:
it would be nice, if you can add Unicode support for File-Loops.
I checked into this and I don't think there's a quick fix because even if A_LoopFileFullPathUnicode were added as a new built-in variable containing the filename as Unicode, and even if file-pattern loops were fixed to be able to recurse into Unicode directories, none of the other commands understand how to operate on Unicode file names, so all you would be able to do is send the Unicode file and folder names to the clipboard, which seems of very limited use.

If script variables could be flagged internally as to whether they contain Unicode vs. ANSI, commands such as FileMove could be altered to conditionally call the Unicode version of the Windows API to act upon the file. However, this would require fairly extensive changes throughout the program.

In any case, it's still on the to-do list. If you have a preference, please vote in the poll.
Back to top
View user's profile Send private message Send e-mail
Tekl



Joined: 24 Sep 2004
Posts: 813
Location: Germany

PostPosted: Fri Dec 10, 2004 2:10 pm    Post subject: Reply with quote

Hi Chris,

I use the file-loop to calculate the foldersize, so if it is possible to calculate it with the unicode-support to the correct value it would be enough for me. But some day it could be more than only calculating the foldersize.

I think it would be better that you generally switch to unicode, but that should not have priority. Maybe it could be an argument for people to switch from autoit to ahk.

If there is a way to use the system to calculate the foldersize it could be the better way.

Tekl
Back to top
View user's profile Send private message Visit poster's website
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10467

PostPosted: Fri Dec 10, 2004 2:28 pm    Post subject: Reply with quote

Thanks for the info. In case it's suitable, here is an alternative that uses the Explorer's properties dialog:
Code:
FileSelectFolder, Folder
if Folder =
   return
Run, properties "%Folder%"
WinWait, ahk_class #32770, Size on disk:  ; <<< Might need to be changed if not English.
Sleep, 1000  ; Give it time to calculate the size (replace this with a more reliable method).
ControlGetText, FolderSize, Edit4
WinClose
MsgBox Folder size is %FolderSize%.

Quote:
I think it would be better that you generally switch to unicode, but that should not have priority.
An entirely-Unicode version of AutoHotkey is tentatively planned for the future. It will probably be kept separate from the ANSI version because certain things about Unicode are worse, most notably that every variable would take up twice as much memory.

Having a single version that does both Unicode and ANSI (by means such as a #Unicode directive) is a possibility too, but I suspect that would increase the size of compiled scripts, and reduce performance, because the program would have to include the ANSI and Unicode version of almost every API function and check which one to call at runtime.

Edit: Added more comments to sample script.


Last edited by Chris on Fri Dec 10, 2004 2:34 pm; edited 2 times in total
Back to top
View user's profile Send private message Send e-mail
Tekl



Joined: 24 Sep 2004
Posts: 813
Location: Germany

PostPosted: Fri Dec 10, 2004 2:32 pm    Post subject: Reply with quote

Hi,

when you store varaibles in utf-8 they use almost the same memory if they contain western languages.

Tekl
Back to top
View user's profile Send private message Visit poster's website
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10467

PostPosted: Fri Dec 10, 2004 2:37 pm    Post subject: Reply with quote

That's true. However, Unicode apps do not typically store data as UTF-8 internally since there would be the performance loss of constantly having to convert to and from UTF/Unicode. Also, such conversions would probably add considerable complexity to the program.

My understanding is that typically a compiler switch is used in the source code to determine whether to build a Unicode or ANSI version (after having converted all the source code to support such conditional compilation, which would be a huge but one-time task). The Unicode version then uses double-wide characters for all strings internally.

The advantage to the above isn't just performance. It also allows a single source code base to be used to create both the Unicode and ANSI versions of the program.
Back to top
View user's profile Send private message Send e-mail
Atomhrt



Joined: 02 Sep 2004
Posts: 128
Location: Sunnyvale

PostPosted: Fri Dec 10, 2004 4:44 pm    Post subject: Reply with quote

I chose "other". I don't use unicode now, but that's not to say I may not have a need in the future.
_________________
I am he of whom he speaks!
Back to top
View user's profile Send private message
Tekl



Joined: 24 Sep 2004
Posts: 813
Location: Germany

PostPosted: Fri Aug 26, 2005 2:13 pm    Post subject: Reply with quote

Hi,

is there way to get IfFileExist work with this folder?
M:\Jobs\Customer A\J12345 Ero¨ffnungsanzeige

It was created by an OS X-Client. As ¨ is ASCII 168 why don't it work?
_________________
Tekl
Back to top
View user's profile Send private message Visit poster's website
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10467

PostPosted: Sat Aug 27, 2005 3:02 am    Post subject: Reply with quote

I think it could be done via DllCall, but it might be somewhat complicated. If you just want to check for the existence of the folder itself, the approach would involve calling GetFileAttributesW, which is the Unicode counterpart of GetFileAttributesA (ANSI).

The problem here is that you have to get Unicode text into a normal AutoHotkey variable somehow. I think it could be done via VarSetCapacity (to make it large enough) followed by MultiByteToWideChar() to convert the UTF-8 encoded version of "M:\Jobs\Customer A\J12345 Ero¨ffnungsanzeige" to a Unicode string in a variable. That variable could then be passed to GetFileAttributesW().
Back to top
View user's profile Send private message Send e-mail
Tekl



Joined: 24 Sep 2004
Posts: 813
Location: Germany

PostPosted: Sat Aug 27, 2005 11:07 am    Post subject: Reply with quote

Hi,

why do you think it's an unicode name? For OS X it's true, the call it decomposed Unicode, but all PCs see only the ANSI-Char, so it seem that it is not a Unicode-filename. Don't the normal non-unicode-routines support high ascii-characters?
_________________
Tekl
Back to top
View user's profile Send private message Visit poster's website
Chris
Site Admin


Joined: 02 Mar 2004
Posts: 10467

PostPosted: Sat Aug 27, 2005 12:13 pm    Post subject: Reply with quote

Tekl wrote:
why do you think it's an unicode name?
When you said earlier, "Is there way to get IfFileExist work with this folder?" I thought you were implying that it had a Unicode filename. If the filename isn't Unicode, IfFileExist should already work on it.

Keep in mind that I think a filename can be Unicode even though all of its characters can be expressed in ANSI. Maybe this is only possible when the name of the file or folder contains Unicode characters that happen to look exactly like certain ANSI characters.
Back to top
View user's profile Send private message Send e-mail
Tekl



Joined: 24 Sep 2004
Posts: 813
Location: Germany

PostPosted: Sat Aug 27, 2005 2:55 pm    Post subject: Reply with quote

Thanks Chris,

so, instead of wasting time, I just wait till we get our new server which automatically makes correct ansi-names from OS-X-files. In the meantime I try to rename the folders by myself.
_________________
Tekl
Back to top
View user's profile Send private message Visit poster's website
wOxxOm



Joined: 09 Feb 2006
Posts: 319

PostPosted: Thu Mar 02, 2006 2:45 pm    Post subject: [Script] :: UNICODE clipboard convert to ANSI function Reply with quote

I use this function only to get unicode text from clipboard so it doesn't have a wrapper to save clipboard text which would be useful for conversion anywhere in your code. Example is coded using info from eggheads.experts. Extremely fast and reliable. Tested with cyrillic (russian) names only.

Code:
clipAnsi()
{
   transform,ca_Clip,unicode ; get clipboard text in UTF-8
   varSetCapacity(ca_WideText,1000,0) ; allocate buffer for 2-byte-char string
   varSetCapacity(ca_AnsiText,1000,0) ; alloc for resulting ansi string
   ; Convert UTF-8 to UTF-16.   CP_UTF8=65001
   if dllCall("MultiByteToWideChar",uint,65001, uint,0, str,ca_Clip
              , unit,-1, str,ca_WideText, uint,500)
      dllCall("WideCharToMultiByte",uint,0, uint,0, str,ca_WideText
              , unit,-1, str,ca_AnsiText, uint,500, uint,0, uint,0)
      ; Convert UTF-16 to ANSI.  CP_ACP=0
   return ca_AnsiText
}
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kapege.de



Joined: 07 Feb 2005
Posts: 186
Location: Munich, Germany

PostPosted: Fri Mar 03, 2006 9:26 am    Post subject: Reply with quote

Before I'll vote I would ask if German umlauts (mutated vowels) needs unicde. I don't think so, so then my answere would be "No".
_________________
Peter

Wisenheiming for beginners: KaPeGe (German only, sorry)
Back to top
View user's profile Send private message Visit poster's website
bugmenot



Joined: 03 Jul 2006
Posts: 40

PostPosted: Tue Jul 04, 2006 1:50 pm    Post subject: Reply with quote

Ascii is 4bit and german umlauts aren't part of it

there is an "extended" set which can take the remaining 4 bits to reach the 8 bit limit. you remember codepages from dos? if you have not the right one set up, there were no umlauts.

i think windows uses ansi for this which might be that and windows is able to handle multiple ansi sets but i am not shure.

anyway i guess there is no problem to put umlauts into unicode to have them interchangeable into the whole world so to answer you question: could be Wink

anyway i'd like to see complete unicode support of whole scripts. what about that? converting only certain functions leads to nothing. isn't it far better to create an unicode mode or similar? maybe this runs only on win xp/2k?
Back to top
View user's profile Send private message
PhiLho



Joined: 27 Dec 2005
Posts: 6721
Location: France (near Paris)

PostPosted: Tue Jul 04, 2006 2:50 pm    Post subject: Reply with quote

bugmenot wrote:
Ascii is 4bit and german umlauts aren't part of it

there is an "extended" set which can take the remaining 4 bits to reach the 8 bit limit. you remember codepages from dos? if you have not the right one set up, there were no umlauts.
I believe you are mixing up things...
With 4 bits, you can have only 16 values, which is a bit small to put all the occidental alphabet inside. Even the old TTY code was 5 bits... Smile
No, Ascii is 7bit, and Ansi is 8bit: a single bit allows to double the number of codes!

I believe indeed all German (and French, and most of Western Europa) accents are in the default Ansi codepage, aka. CP-1252 or (more or less) ISO-8859-1.
_________________
vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Wish List All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group