 |
AutoHotkey Community Let's help each other out
|
| View previous topic :: View next topic |
| Author |
Message |
TheLeO
Joined: 11 Jun 2005 Posts: 264 Location: England ish
|
Posted: Sat Aug 08, 2009 12:48 pm Post subject: FileRead is missbehaving with Umlauts, äüöß (RESOLVED) |
|
|
It's kinda weird, and I can't quite figure out why it's happening,
instead of the ßäöü characters I get
ö = ö
ü = ü
ß = ß
ä = ä
Here are two examples.
This works great.
| Code: |
msg = In der Hölle!
msgbox, %msg%
|
This shows funny:
| Code: |
FileAppend, In der Hölle!, temp.txt
FileRead, umlaute, temp.txt
MsgBox %umlaute%
|
---------------------------
iTunesAutoLyrics.ahk
---------------------------
In der Hölle!!
---------------------------
OK
---------------------------
or in my script it's also often shown as "In der Hölle!"
a lot of the time I get stuff like this:
weiß
plötzlich
Hölle!
zu mir hält
falschem Stück
Any thoughts?
Help or any known workarounds would really be appreciated!
Here is the full script btw, that really reflects upon the problem, if you see, in the message box that comes up, you get a lot of weird characters instead of the umlauts.. ?
| Code: |
URLDownloadToFile,http://lyricwiki.org/Farin_Urlaub:Dusche, TEMPiTunesAutoLyricsCurrentsong.xml
Sleep 100
FileRead, RawXMLFileVAR, TEMPiTunesAutoLyricsCurrentsong.xml
Loop, parse, RawXMLFileVAR, `n, `r ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
FailedString = There is currently no text in this page
IfInString, A_LoopField,%FailedString%
{
qtt(":-(")
Return
}
IfInString, A_LoopField,class`='lyricbox' ;note I'm escaping the equal sign.
RawLineWithSongLyric = %A_LoopField%
}
Needle = class`='lyricbox'
PositionOfNeedle := InStr(RawLineWithSongLyric, Needle)
FinalPositionOfNeedle := (PositionOfNeedle + 17)
StringTrimLeft, RawlineMinusStuffAtTheBeggining, RawLineWithSongLyric, %FinalPositionOfNeedle%
StringReplace, SongLyricFinal, RawlineMinusStuffAtTheBeggining, <br />, `n, All
msgbox, %SongLyricFinal%
|
_________________ ::
I Have Spoken
::
Last edited by TheLeO on Sun Aug 09, 2009 2:57 pm; edited 1 time in total |
|
| Back to top |
|
 |
SoLong&Thx4AllTheFish
Joined: 27 May 2007 Posts: 4999
|
Posted: Sat Aug 08, 2009 1:07 pm Post subject: Re: FileRead variables are missbehaving with Umlauts, ä ü ö |
|
|
| TheLeO wrote: | | Code: |
FileAppend, In der Hölle!, temp.txt
FileRead, umlaute, temp.txt
MsgBox %umlaute%
|
| This works also great for me, make sure the AHK script is in ASCII not utf-8/unicode. If you use notepad++ for example you can change the format via the format menu  _________________ AHK Wiki FAQ
TF : Text files & strings lib, TF Forum |
|
| Back to top |
|
 |
Sean
Joined: 12 Feb 2007 Posts: 2462
|
Posted: Sat Aug 08, 2009 2:20 pm Post subject: |
|
|
| I suppose the opposite. Assuming you're in the German locale, I suspect the downloaded xml file is in UTF-8, not in German codepage. MsgBox cannot display properly texts in UTF-8. |
|
| Back to top |
|
 |
[VxE]
Joined: 07 Oct 2006 Posts: 3254 Location: Simi Valley, CA
|
Posted: Sat Aug 08, 2009 7:27 pm Post subject: |
|
|
Coincidentally, I posted a function yesterday that can read plain text files in UTF-8 format. The link is here. Non-ansi characters are converted into 〹 format. You may be interested to know that  is the BOM that denotes UTF-8 format in a text file.
Edit:] I updated my 'FileReadU' function to have a 'manual override' for the file's encoding. _________________ Ternary (a ? b : c) guide TSV Table Manipulation Library
Post code inside [code][/code] tags! |
|
| Back to top |
|
 |
TheLeO
Joined: 11 Jun 2005 Posts: 264 Location: England ish
|
Posted: Sun Aug 09, 2009 10:52 am Post subject: |
|
|
| Sean wrote: | | I suppose the opposite. Assuming you're in the German locale, I suspect the downloaded xml file is in UTF-8, not in German codepage. MsgBox cannot display properly texts in UTF-8. |
Thanks for the reply guys.
I think the msgbox thing might be the issue. However, I can't "send" the variable correctly either.. which is really restricting me...
I need to be able to send the content of the variable into a standard control somehow....
For example, try the function below, it will demonstrate my point:
Open a notepad and press f9
it will paste the text in-correctly.
>>However<<<
It used the same variable to write into lyrics.txt, if you open that, the text is displayed correctly.
e.g
text that is sent looks like this: (i.e corrupted format)
"
Sie sollen brennenSie sollen brennenIn der HölleStirbStirb, Fernseher, stirbStirb
"
The text in the lyrics.txt looks like this: (format is intact)
"
Sie sollen brennen!
Sie sollen brennen!
In der Hölle!
Stirb!
Stirb, Fernseher, stirb!
Stirb!
"
I'm running an English/us version of windows 7 and I type my text using a uk keyboard (if that makes any difference?) the website where the lyrics are downloaded from, is in English as well, i.e:
http://lyricwiki.org/Farin_Urlaub:Dusche
I tried the FileReadU, but it didn't seem to make a difference, i.e the text is read correctly, but it's not sent correctly..?
This is a "bang-head-on-Wall" type situation for me, so close to the goal but I can't quite get there... ..>??< frustrating..
---edit
I also tried:
ControlSend, RichEdit20W1, %SongLyricFinal%, 88
into the iTunes control that holds the lyrics.
But I get a mess:
N der hLle
Tirb, FErnseher, stirb
Instead of:
In der Hölle!
Stirb, Fernseher, stirb!
open note pad, and then press f9. then open the lyrics.txt in the script folder to compare.
| Code: |
return
f9::
;---------- download file
URLDownloadToFile,http://lyricwiki.org/Farin_Urlaub:Dusche, TEMPiTunesAutoLyricsCurrentsong.xml
Sleep 100
;----------- read xml file
FileRead, RawXMLFileVAR, TEMPiTunesAutoLyricsCurrentsong.xml
;RawXMLFileVAR := FileReadU("TEMPiTunesAutoLyricsCurrentsong.xml", ForceType="Auto: Unicode" )
;RawXMLFileVAR := FileReadU("TEMPiTunesAutoLyricsCurrentsong.xml", ForceType="Auto: UTF-8" )
;---------------- retrieve the line with the lyric
Loop, parse, RawXMLFileVAR, `n, `r ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
FailedString = There is currently no text in this page
IfInString, A_LoopField,%FailedString%
{
qtt(":-(")
Return
}
IfInString, A_LoopField,class`='lyricbox' ;note I'm escaping the equal sign.
RawLineWithSongLyric = %A_LoopField%
}
;---------------------- format it correctly.
Needle = class`='lyricbox'
PositionOfNeedle := InStr(RawLineWithSongLyric, Needle)
FinalPositionOfNeedle := (PositionOfNeedle + 17)
StringTrimLeft, RawlineMinusStuffAtTheBeggining, RawLineWithSongLyric, %FinalPositionOfNeedle%
StringReplace, SongLyricFinal, RawlineMinusStuffAtTheBeggining, <br />, `n, All
;--------------->>>>>>>>>>>. THE OUT PUT PART<<<<<<<<<<<<<<<<<<<<<<,
FileAppend, %SongLyricFinal%, Lyrics.txt
Clipboard = %SongLyricFinal%
send, %SongLyricFinal%
;msgbox, %SongLyricFinal%
|
_________________ ::
I Have Spoken
:: |
|
| Back to top |
|
 |
Sean
Joined: 12 Feb 2007 Posts: 2462
|
Posted: Sun Aug 09, 2009 1:10 pm Post subject: |
|
|
| Your post appears confusing. So, you're in English locale, nevertheless can read texts in German? I think all these are essentially reduced to: UNICODE vs ANSI. If you need to handle texts in locale-free manner, all have to be managed in UNICODE. Unfortunately, however, AHK is not an Unicode app, it'll be tied/limited to the ANSI codepage currently selected. So, you cannot send every kind of texts as you want with AHK's built-in Send... functions, especially to an Unicode window. My suggestion in this case is: first convert the text from UTF-8 to UTF-16 then send it, e.g. using SendInput API with the flag KEYEVENTF_UNICODE. |
|
| Back to top |
|
 |
YMP
Joined: 23 Dec 2006 Posts: 418 Location: Russia
|
Posted: Sun Aug 09, 2009 2:36 pm Post subject: |
|
|
Yes, the downloaded file is in UTF-8. I suggest to try converting its text to ANSI (windows-1252) after FileRead. At least, after I did that, umlauts were sent to Notepad correctly.
| Code: |
f9::
;---------- download file
URLDownloadToFile,http://lyricwiki.org/Farin_Urlaub:Dusche, TEMPiTunesAutoLyricsCurrentsong.xml
Sleep 100
;----------- read xml file
FileRead, RawXMLFileVAR, TEMPiTunesAutoLyricsCurrentsong.xml
;--------— convert from UTF-8 to ANSI -----—
RawLen := StrLen(RawXMLFileVAR)
BufSize := (RawLen + 1) * 2
VarSetCapacity(Buf, BufSize, 0)
DllCall("MultiByteToWideChar", "uint", 65001, "int", 0, "str", RawXMLFileVAR
, "int", -1, "uint", &Buf, "uint", RawLen + 1)
DllCall("WideCharToMultiByte", "uint", 1252, "int", 0, "uint", &Buf, "int", -1
, "str", RawXMLFileVAR, "uint", RawLen + 1
, "int", 0, "int", 0)
;---------------- retrieve the line with the lyric
Loop, parse, RawXMLFileVAR, `n, `r ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
FailedString = There is currently no text in this page
IfInString, A_LoopField,%FailedString%
{
qtt(":-(")
Return
}
IfInString, A_LoopField,class`='lyricbox' ;note I'm escaping the equal sign.
RawLineWithSongLyric = %A_LoopField%
}
;---------------------- format it correctly.
Needle = class`='lyricbox'
PositionOfNeedle := InStr(RawLineWithSongLyric, Needle)
FinalPositionOfNeedle := (PositionOfNeedle + 17)
StringTrimLeft, RawlineMinusStuffAtTheBeggining, RawLineWithSongLyric, %FinalPositionOfNeedle%
StringReplace, SongLyricFinal, RawlineMinusStuffAtTheBeggining, <br />, `n, All
;--------------->>>>>>>>>>>. THE OUT PUT PART<<<<<<<<<<<<<<<<<<<<<<,
FileAppend, %SongLyricFinal%, Lyrics.txt
Clipboard = %SongLyricFinal%
send, %SongLyricFinal%
;msgbox, %SongLyricFinal%
|
Last edited by YMP on Wed Aug 12, 2009 3:50 pm; edited 1 time in total |
|
| Back to top |
|
 |
TheLeO
Joined: 11 Jun 2005 Posts: 264 Location: England ish
|
Posted: Sun Aug 09, 2009 2:55 pm Post subject: |
|
|
| YMP wrote: | Yes, the downloaded file is in UTF-8. I suggest to try converting its text to ANSI (windows-1252) after FileRead. At least, after I did that, umlauts were sent to Notepad correctly.
| Code: |
f9::
;---------- download file
URLDownloadToFile,http://lyricwiki.org/Farin_Urlaub:Dusche, TEMPiTunesAutoLyricsCurrentsong.xml
Sleep 100
;----------- read xml file
FileRead, RawXMLFileVAR, TEMPiTunesAutoLyricsCurrentsong.xml
;--------— convert from UTF-8 to ANSI -----—
RawLen := StrLen(RawXMLFileVAR)
BufSize := (RawLen + 1) * 2
VarSetCapacity(Buf, BufSize, 0)
DllCall("MultiByteToWideChar", "uint", 65001, "int", 0, "str", RawXMLFileVAR
, "int", -1, "uint", &Buf, "uint", BufSize)
DllCall("WideCharToMultiByte", "uint", 1252, "int", 0, "uint", &Buf, "int", -1
, "str", RawXMLFileVAR, "uint", RawLen + 1
, "int", 0, "int", 0)
;---------------- retrieve the line with the lyric
Loop, parse, RawXMLFileVAR, `n, `r ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
FailedString = There is currently no text in this page
IfInString, A_LoopField,%FailedString%
{
qtt(":-(")
Return
}
IfInString, A_LoopField,class`='lyricbox' ;note I'm escaping the equal sign.
RawLineWithSongLyric = %A_LoopField%
}
;---------------------- format it correctly.
Needle = class`='lyricbox'
PositionOfNeedle := InStr(RawLineWithSongLyric, Needle)
FinalPositionOfNeedle := (PositionOfNeedle + 17)
StringTrimLeft, RawlineMinusStuffAtTheBeggining, RawLineWithSongLyric, %FinalPositionOfNeedle%
StringReplace, SongLyricFinal, RawlineMinusStuffAtTheBeggining, <br />, `n, All
;--------------->>>>>>>>>>>. THE OUT PUT PART<<<<<<<<<<<<<<<<<<<<<<,
FileAppend, %SongLyricFinal%, Lyrics.txt
Clipboard = %SongLyricFinal%
send, %SongLyricFinal%
;msgbox, %SongLyricFinal%
|
|
Thank you soooo much. that did it!!!!!! _________________ ::
I Have Spoken
:: |
|
| Back to top |
|
 |
YMP
Joined: 23 Dec 2006 Posts: 418 Location: Russia
|
Posted: Wed Aug 12, 2009 3:52 pm Post subject: |
|
|
| I forgot that the size of the Unicode buffer should be specified in wide characters and not in bytes. Fixed that. |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|