Clipboard corrupted on systems with default English codepage
#1
stasok
Posted 15 June 2012 - 04:33 PM
I have Autohotkey_L (Unicode 64-bit installation option) installed on my test production system (Win 7 64-bit). The test production system has English as the language of the default system codepage. I use the code below to paste text without any formatting (borrowed from Laszlo) to discard all clipboard formats except Unicode text, paste the text and then restore the original clipboard:
Clip0 = %ClipBoardAll%
ClipBoard = %ClipBoard% ; Convert to text
SendInput ^{vk56} ; Send Ctrl+V to window
; Don't change clipboard while it is pasted!
Sleep 100
ClipBoard = %Clip0% ; Restore original ClipBoard
VarSetCapacity(Clip0, 0) ; Free memory
If I copy Russian text to the clipboard and use the code, it works fine when the system code page is Russian. However, if the system code page is English (on my test production system), the first paste attempt works fine, but all subsequent paste attempts paste ???? instead of Russian characters. English letters are unchanged. Because I am using Autohotkey_L Unicode 64-bit, there should not be any problems with Unicode to ANSI conversion. However, I believe that the assignment ClipBoard = %Clip0% actually performs conversion using the system code page, otherwise the problem would not occur.
I've checked ErrorLevel after the assignment line and it is 0. The code works similarly on my colleagues' computers with default English code page.
Does anyone have any idea what is happening? I would appreciate any help,
Thanks!
Stanislav
#2
stasok
Posted 15 June 2012 - 05:05 PM
I have checked the clipboard contents with Freeclipviewer before and after using the above code. After the assignment of Clip0 to Clipboard variable, the number of clipboard formats (originally 20 after copying text in Word) drops down to 12 and the Unicode text format changes. HTML, Richtext, metafile and other clipboard formats are okay.
Regards,
Stanislav
#3
stasok
Posted 15 June 2012 - 07:51 PM
It seems I have corrected the problem using the Paste method of WinClip class (viewtopic.php?p=498667). However, I think Clipboard restoration in Autohotkey_L does not work as intended and should be changed to something similar as in WinClip class.
Best regards,
Stanislav
#4
Posted 16 June 2012 - 03:52 PM
Clipboard = %Clip0% does not perform any text conversions; it merely copies binary data back onto the clipboard. WinClip appears to work the same way. However, if I'm not mistaken, it discards the CF_TEXT and CF_OEMTEXT formats. ClipboardAll saves the first text format, which may be CF_TEXT, CF_OEMTEXT or CF_UNICODETEXT. It sounds like you're getting (non-Unicode) CF_TEXT, and in the process of saving and restoring it, the locale information is lost.I'll look into this some more later.
#5
Posted 17 June 2012 - 01:15 AM
What I find odd is that the input language is used by default:
Normally, non-Unicode applications use the system default ANSI code page for all strings. While the input language can change at any time, the system code page remains constant and is the same for all applications until the OS restarts (at which time it can change). So I don't know under what circumstances ANSI text copied to the clipboard would actually be in the format defined by the input language rather than the system code page.When you close the clipboard, if it contains CF_TEXT data but no CF_LOCALE data, the system automatically sets the CF_LOCALE format to the current input language.
Source: Standard Clipboard Formats
That aside, as far as I can tell, AutoHotkey does not affect the interpretation or binary value of CF_TEXT. However, if an application copies both CF_TEXT (first) and CF_UNICODETEXT (second), only the first format is kept. It seems likely that the CF_TEXT data would be in the system code page, which (if set to US English) could not contain Russian characters, regardless of the input language. You would be able to paste the text into Unicode-aware applications up until AutoHotkey discards the CF_UNICODETEXT data.
Now, I don't have a clue why any application would store both formats explicitly, since the system does automatic conversion. I would like to know:
[*:3vg17qd9]Which clipboard formats you observed before and after performing the assignment.
[*:3vg17qd9]The actual binary data/encoding of each text format on the clipboard.
[*:3vg17qd9]Where you are copying from and pasting to.
#6
stasok
Posted 17 June 2012 - 07:19 PM
First, thanks for the excellent Autohotkey_L.
Here is what I did and what happened:
- I used Microsoft Word
- When I copy something to clipboard, the following clipboard formats appear on the clipboard:
Rich Text Format, HTML Format, Text (??????), Locale Identifier (09 04 00 00 binary - when current input language is English, 19 04 00 00 - Russian), Unicode Text Format (russian_text_6_letters), OEM Text (??????)
Other formats include: Ole Private Data, Hyperlink, HyperlinkWordBkmk, ObjectLink, Link Source Descriptor, Link Source, OwnerLink, Native, Embed Source, Object Descriptor, DataObject, Metafile Picture Format, Enhanced Metafile
When I run the code that stores clipboardall to a variable and then restores from that variable, I get this:
Rich Text Format (OK), HTML Format (OK), Text (??????), Locale Identifier (09 04 00 00 binary - when current input language is English, 19 04 00 00 - Russian), Unicode Text Format (??????), OEM Text (??????)
Also retained are DataObject, Object Descriptor, HyperlinkWordBkmk, Hyperlink, Ole Private Data. Other formats disappear.
Question marks show 6 Russian letters as they appear in Free Clipboard Viewer.
In other words, Unicode Text Format becomes equal to Text format after the assignment.
I tried running the code when the Input language was either Russian or English, but the result was the same. Only the Locale Identifier clipboard format changed as described above.
When I used Notepad, it copied Text (??????), Unicode Text Format (6 Russian letters), OEM Text (??????) and Locale Identifier (19 04 00 00 - for some reason different from above). After the assignment, the clipboard contents are the same: no clipboard corruption occurs.
However, when I integrated WinClip class, wc.Snap(data) and wc.Restore(data) store and restore clipboard data more accurately. I only lose OwnerLink format after the assignment, and Unicode Text Format contains the original text after restoring.
Thanks a lot,
Best regards,
Stanislav
#7
stasok
Posted 17 June 2012 - 07:22 PM
In the case of Notepad, Locale Identifier contains 09 04 00 00 when current input language is English, 19 04 00 00 when input language is Russian, just like in Word.
Regards,
Stanislav
#8
Posted 17 June 2012 - 09:39 PM
FYI, some detailed comments from the source code:
Typical of Microsoft to not following their own recommendations - using multiple text formats and putting the least preferred one first.// EnumClipboardFormats() retrieves all formats, including synthesized formats that don't
// actually exist on the clipboard but are instead constructed on demand. Unfortunately,
// there doesn't appear to be any way to reliably determine which formats are real and
// which are synthesized (if there were such a way, a large memory savings could be
// realized by omitting the synthesized formats from the saved version). One thing that
// is certain is that the "real" format(s) come first and the synthesized ones afterward.
// However, that's not quite enough because although it is recommended that apps store
// the primary/preferred format first, the OS does not enforce this. For example, testing
// shows that the apps do not have to store CF_UNICODETEXT prior to storing CF_TEXT,
// in which case the clipboard might have inaccurate CF_TEXT as the first element and
// more accurate/complete (non-synthesized) CF_UNICODETEXT stored as the next.
// In spite of the above, the below seems likely to be accurate 99% or more of the time,
// which seems worth it given the large savings of memory that are achieved, especially
// for large quantities of text or large images. Confidence is further raised by the
// fact that MSDN says there's no advantage/reason for an app to place multiple formats
// onto the clipboard if those formats are available through synthesis.
// And since CF_TEXT always(?) yields synthetic CF_OEMTEXT and CF_UNICODETEXT, and
// probably (but less certainly) vice versa: if CF_TEXT is listed first, it might certainly
// mean that the other two do not need to be stored. There is some slight doubt about this
// in a situation where an app explicitly put CF_TEXT onto the clipboard and then followed
// it with CF_UNICODETEXT that isn't synthesized, nor does it match what would have been
// synthesized. However, that seems extremely unlikely (it would be much more likely for
// an app to store CF_UNICODETEXT *first* followed by custom/non-synthesized CF_TEXT, but
// even that might be unheard of in practice). So for now -- since there is no documentation
// to be found about this anywhere -- it seems best to omit some of the most common
// synthesized formats:
// CF_TEXT is the first of three text formats to appear: Omit CF_OEMTEXT and CF_UNICODETEXT.
// (but not vice versa since those are less certain to be synthesized)
// (above avoids using four times the amount of memory that would otherwise be required)
// UPDATE: Only the first text format is included now, since MSDN says there is no
// advantage/reason to having multiple non-synthesized text formats on the clipboard.
I see two possible solutions:
[*:6xogzppp]Always store only CF_UNICODETEXT, assuming any applications requesting CF_TEXT will get a correctly-synthesized value.
[*:6xogzppp]Always store all text formats, despite the increased memory usage.
#9
stasok
Posted 18 June 2012 - 05:57 AM
It does seem that Microsoft has problems in their handling of the clipboard. I'll stick with WinClip for now because it seems to keep the clipboard almost untouched.
Although my case is a bit rate, do you intend to change the way Autohotkey_L restores the clipboard? Maybe it's better to just put all clipboard formats back and not drop CF_UNICODETEXT?
Best regards,
Stanislav
#10
Posted 18 June 2012 - 07:30 AM
#11
stasok
Posted 18 June 2012 - 04:00 PM




