AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

FileRead is missbehaving with Umlauts, äüöß (RESOLVED)

 
Reply to topic    AutoHotkey Community Forum Index -> Ask for Help
View previous topic :: View next topic  
Author Message
TheLeO



Joined: 11 Jun 2005
Posts: 264
Location: England ish

PostPosted: Sat Aug 08, 2009 12:48 pm    Post subject: FileRead is missbehaving with Umlauts, äüöß (RESOLVED) Reply with quote

It's kinda weird, and I can't quite figure out why it's happening,

instead of the ßäöü characters I get

ö = ö
ü = ü
ß = ß
ä = ä

Here are two examples.

This works great.
Code:

msg = In der Hölle!
msgbox, %msg%



This shows funny:
Code:

FileAppend, In der Hölle!, temp.txt
FileRead, umlaute, temp.txt
MsgBox %umlaute%



---------------------------
iTunesAutoLyrics.ahk
---------------------------
In der Hölle!!
---------------------------
OK
---------------------------

or in my script it's also often shown as "In der Hölle!"

a lot of the time I get stuff like this:
weiß
plötzlich
Hölle!
zu mir hält
falschem Stück



Any thoughts?
Help or any known workarounds would really be appreciated! Very Happy




Here is the full script btw, that really reflects upon the problem, if you see, in the message box that comes up, you get a lot of weird characters instead of the umlauts.. ?


Code:



URLDownloadToFile,http://lyricwiki.org/Farin_Urlaub:Dusche, TEMPiTunesAutoLyricsCurrentsong.xml
Sleep 100
FileRead, RawXMLFileVAR, TEMPiTunesAutoLyricsCurrentsong.xml


Loop, parse, RawXMLFileVAR, `n, `r  ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
  FailedString = There is currently no text in this page
  IfInString, A_LoopField,%FailedString% 
    {
    qtt(":-(")
    Return
    }
  IfInString, A_LoopField,class`='lyricbox'  ;note I'm escaping the equal sign.
    RawLineWithSongLyric = %A_LoopField%

}


Needle = class`='lyricbox'

PositionOfNeedle := InStr(RawLineWithSongLyric, Needle)
FinalPositionOfNeedle := (PositionOfNeedle + 17)

StringTrimLeft, RawlineMinusStuffAtTheBeggining, RawLineWithSongLyric, %FinalPositionOfNeedle%

StringReplace, SongLyricFinal, RawlineMinusStuffAtTheBeggining, <br />, `n, All

msgbox, %SongLyricFinal%


_________________
::
I Have Spoken
::


Last edited by TheLeO on Sun Aug 09, 2009 2:57 pm; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
SoLong&Thx4AllTheFish



Joined: 27 May 2007
Posts: 4999

PostPosted: Sat Aug 08, 2009 1:07 pm    Post subject: Re: FileRead variables are missbehaving with Umlauts, ä ü ö Reply with quote

TheLeO wrote:
Code:

FileAppend, In der Hölle!, temp.txt
FileRead, umlaute, temp.txt
MsgBox %umlaute%
This works also great for me, make sure the AHK script is in ASCII not utf-8/unicode. If you use notepad++ for example you can change the format via the format menu Smile
_________________
AHK Wiki FAQ
TF : Text files & strings lib, TF Forum
Back to top
View user's profile Send private message
Sean



Joined: 12 Feb 2007
Posts: 2462

PostPosted: Sat Aug 08, 2009 2:20 pm    Post subject: Reply with quote

I suppose the opposite. Assuming you're in the German locale, I suspect the downloaded xml file is in UTF-8, not in German codepage. MsgBox cannot display properly texts in UTF-8.
Back to top
View user's profile Send private message
[VxE]



Joined: 07 Oct 2006
Posts: 3254
Location: Simi Valley, CA

PostPosted: Sat Aug 08, 2009 7:27 pm    Post subject: Reply with quote

Coincidentally, I posted a function yesterday that can read plain text files in UTF-8 format. The link is here. Non-ansi characters are converted into &#12345; format. You may be interested to know that  is the BOM that denotes UTF-8 format in a text file.

Edit:] I updated my 'FileReadU' function to have a 'manual override' for the file's encoding.
_________________
Ternary (a ? b : c) guide     TSV Table Manipulation Library
Post code inside [code][/code] tags!
Back to top
View user's profile Send private message
TheLeO



Joined: 11 Jun 2005
Posts: 264
Location: England ish

PostPosted: Sun Aug 09, 2009 10:52 am    Post subject: Reply with quote

Sean wrote:
I suppose the opposite. Assuming you're in the German locale, I suspect the downloaded xml file is in UTF-8, not in German codepage. MsgBox cannot display properly texts in UTF-8.



Thanks for the reply guys.

I think the msgbox thing might be the issue. However, I can't "send" the variable correctly either.. which is really restricting me...
I need to be able to send the content of the variable into a standard control somehow....

For example, try the function below, it will demonstrate my point:

Open a notepad and press f9
it will paste the text in-correctly.

>>However<<<
It used the same variable to write into lyrics.txt, if you open that, the text is displayed correctly.

e.g
text that is sent looks like this: (i.e corrupted format)
"
Sie sollen brennenSie sollen brennenIn der HölleStirbStirb, Fernseher, stirbStirb
"

The text in the lyrics.txt looks like this: (format is intact)
"
Sie sollen brennen!
Sie sollen brennen!
In der Hölle!
Stirb!
Stirb, Fernseher, stirb!
Stirb!
"


I'm running an English/us version of windows 7 and I type my text using a uk keyboard (if that makes any difference?) the website where the lyrics are downloaded from, is in English as well, i.e:
http://lyricwiki.org/Farin_Urlaub:Dusche



I tried the FileReadU, but it didn't seem to make a difference, i.e the text is read correctly, but it's not sent correctly..?

This is a "bang-head-on-Wall" type situation for me, so close to the goal but I can't quite get there... ..>??< frustrating..

---edit
I also tried:

ControlSend, RichEdit20W1, %SongLyricFinal%, 88
into the iTunes control that holds the lyrics.

But I get a mess:
N der hLle
Tirb, FErnseher, stirb

Instead of:
In der Hölle!
Stirb, Fernseher, stirb!

Sad
open note pad, and then press f9. then open the lyrics.txt in the script folder to compare.
Code:

return

f9::

;---------- download file
URLDownloadToFile,http://lyricwiki.org/Farin_Urlaub:Dusche, TEMPiTunesAutoLyricsCurrentsong.xml
Sleep 100

;----------- read xml file
FileRead, RawXMLFileVAR, TEMPiTunesAutoLyricsCurrentsong.xml

;RawXMLFileVAR := FileReadU("TEMPiTunesAutoLyricsCurrentsong.xml", ForceType="Auto: Unicode" )
;RawXMLFileVAR := FileReadU("TEMPiTunesAutoLyricsCurrentsong.xml", ForceType="Auto: UTF-8" )


;---------------- retrieve the line with the lyric
Loop, parse, RawXMLFileVAR, `n, `r  ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
  FailedString = There is currently no text in this page
  IfInString, A_LoopField,%FailedString% 
    {
    qtt(":-(")
    Return
    }
  IfInString, A_LoopField,class`='lyricbox'  ;note I'm escaping the equal sign.
    RawLineWithSongLyric = %A_LoopField%

}
;---------------------- format it correctly.
Needle = class`='lyricbox'
PositionOfNeedle := InStr(RawLineWithSongLyric, Needle)
FinalPositionOfNeedle := (PositionOfNeedle + 17)
StringTrimLeft, RawlineMinusStuffAtTheBeggining, RawLineWithSongLyric, %FinalPositionOfNeedle%
StringReplace, SongLyricFinal, RawlineMinusStuffAtTheBeggining, <br />, `n, All



;--------------->>>>>>>>>>>. THE OUT PUT PART<<<<<<<<<<<<<<<<<<<<<<,
FileAppend, %SongLyricFinal%, Lyrics.txt
Clipboard = %SongLyricFinal%
send, %SongLyricFinal%
;msgbox, %SongLyricFinal%

_________________
::
I Have Spoken
::
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
Sean



Joined: 12 Feb 2007
Posts: 2462

PostPosted: Sun Aug 09, 2009 1:10 pm    Post subject: Reply with quote

Your post appears confusing. So, you're in English locale, nevertheless can read texts in German? I think all these are essentially reduced to: UNICODE vs ANSI. If you need to handle texts in locale-free manner, all have to be managed in UNICODE. Unfortunately, however, AHK is not an Unicode app, it'll be tied/limited to the ANSI codepage currently selected. So, you cannot send every kind of texts as you want with AHK's built-in Send... functions, especially to an Unicode window. My suggestion in this case is: first convert the text from UTF-8 to UTF-16 then send it, e.g. using SendInput API with the flag KEYEVENTF_UNICODE.
Back to top
View user's profile Send private message
YMP



Joined: 23 Dec 2006
Posts: 418
Location: Russia

PostPosted: Sun Aug 09, 2009 2:36 pm    Post subject: Reply with quote

Yes, the downloaded file is in UTF-8. I suggest to try converting its text to ANSI (windows-1252) after FileRead. At least, after I did that, umlauts were sent to Notepad correctly.
Code:

f9::

;---------- download file
URLDownloadToFile,http://lyricwiki.org/Farin_Urlaub:Dusche, TEMPiTunesAutoLyricsCurrentsong.xml
Sleep 100

;----------- read xml file
FileRead, RawXMLFileVAR, TEMPiTunesAutoLyricsCurrentsong.xml

;--------— convert from UTF-8 to ANSI -----—

RawLen := StrLen(RawXMLFileVAR)
BufSize := (RawLen + 1) * 2
VarSetCapacity(Buf, BufSize, 0)

DllCall("MultiByteToWideChar", "uint", 65001, "int", 0, "str", RawXMLFileVAR
                             , "int", -1, "uint", &Buf, "uint", RawLen + 1)
DllCall("WideCharToMultiByte", "uint", 1252, "int", 0, "uint", &Buf, "int", -1
                             , "str", RawXMLFileVAR, "uint", RawLen + 1
                             , "int", 0, "int", 0)

;---------------- retrieve the line with the lyric
Loop, parse, RawXMLFileVAR, `n, `r  ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
  FailedString = There is currently no text in this page
  IfInString, A_LoopField,%FailedString% 
    {
    qtt(":-(")
    Return
    }
  IfInString, A_LoopField,class`='lyricbox'  ;note I'm escaping the equal sign.
    RawLineWithSongLyric = %A_LoopField%

}
;---------------------- format it correctly.
Needle = class`='lyricbox'
PositionOfNeedle := InStr(RawLineWithSongLyric, Needle)
FinalPositionOfNeedle := (PositionOfNeedle + 17)
StringTrimLeft, RawlineMinusStuffAtTheBeggining, RawLineWithSongLyric, %FinalPositionOfNeedle%
StringReplace, SongLyricFinal, RawlineMinusStuffAtTheBeggining, <br />, `n, All



;--------------->>>>>>>>>>>. THE OUT PUT PART<<<<<<<<<<<<<<<<<<<<<<,
FileAppend, %SongLyricFinal%, Lyrics.txt
Clipboard = %SongLyricFinal%
send, %SongLyricFinal%
;msgbox, %SongLyricFinal%


Last edited by YMP on Wed Aug 12, 2009 3:50 pm; edited 1 time in total
Back to top
View user's profile Send private message
TheLeO



Joined: 11 Jun 2005
Posts: 264
Location: England ish

PostPosted: Sun Aug 09, 2009 2:55 pm    Post subject: Reply with quote

YMP wrote:
Yes, the downloaded file is in UTF-8. I suggest to try converting its text to ANSI (windows-1252) after FileRead. At least, after I did that, umlauts were sent to Notepad correctly.
Code:

f9::

;---------- download file
URLDownloadToFile,http://lyricwiki.org/Farin_Urlaub:Dusche, TEMPiTunesAutoLyricsCurrentsong.xml
Sleep 100

;----------- read xml file
FileRead, RawXMLFileVAR, TEMPiTunesAutoLyricsCurrentsong.xml

;--------— convert from UTF-8 to ANSI -----—

RawLen := StrLen(RawXMLFileVAR)
BufSize := (RawLen + 1) * 2
VarSetCapacity(Buf, BufSize, 0)

DllCall("MultiByteToWideChar", "uint", 65001, "int", 0, "str", RawXMLFileVAR
                             , "int", -1, "uint", &Buf, "uint", BufSize)
DllCall("WideCharToMultiByte", "uint", 1252, "int", 0, "uint", &Buf, "int", -1
                             , "str", RawXMLFileVAR, "uint", RawLen + 1
                             , "int", 0, "int", 0)

;---------------- retrieve the line with the lyric
Loop, parse, RawXMLFileVAR, `n, `r  ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
  FailedString = There is currently no text in this page
  IfInString, A_LoopField,%FailedString% 
    {
    qtt(":-(")
    Return
    }
  IfInString, A_LoopField,class`='lyricbox'  ;note I'm escaping the equal sign.
    RawLineWithSongLyric = %A_LoopField%

}
;---------------------- format it correctly.
Needle = class`='lyricbox'
PositionOfNeedle := InStr(RawLineWithSongLyric, Needle)
FinalPositionOfNeedle := (PositionOfNeedle + 17)
StringTrimLeft, RawlineMinusStuffAtTheBeggining, RawLineWithSongLyric, %FinalPositionOfNeedle%
StringReplace, SongLyricFinal, RawlineMinusStuffAtTheBeggining, <br />, `n, All



;--------------->>>>>>>>>>>. THE OUT PUT PART<<<<<<<<<<<<<<<<<<<<<<,
FileAppend, %SongLyricFinal%, Lyrics.txt
Clipboard = %SongLyricFinal%
send, %SongLyricFinal%
;msgbox, %SongLyricFinal%



Thank you soooo much. that did it!!!!!!
_________________
::
I Have Spoken
::
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
YMP



Joined: 23 Dec 2006
Posts: 418
Location: Russia

PostPosted: Wed Aug 12, 2009 3:52 pm    Post subject: Reply with quote

I forgot that the size of the Unicode buffer should be specified in wide characters and not in bytes. Fixed that.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    AutoHotkey Community Forum Index -> Ask for Help All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group