Jump to content


Clipboard corrupted on systems with default English codepage


  • Please log in to reply
10 replies to this topic

#1 stasok

stasok
  • Guests

Posted 15 June 2012 - 04:33 PM

Hello,

I have Autohotkey_L (Unicode 64-bit installation option) installed on my test production system (Win 7 64-bit). The test production system has English as the language of the default system codepage. I use the code below to paste text without any formatting (borrowed from Laszlo) to discard all clipboard formats except Unicode text, paste the text and then restore the original clipboard:

Clip0 = %ClipBoardAll%
ClipBoard = %ClipBoard% ; Convert to text
SendInput ^{vk56} ; Send Ctrl+V to window

; Don't change clipboard while it is pasted!
Sleep 100

ClipBoard = %Clip0% ; Restore original ClipBoard
VarSetCapacity(Clip0, 0) ; Free memory

If I copy Russian text to the clipboard and use the code, it works fine when the system code page is Russian. However, if the system code page is English (on my test production system), the first paste attempt works fine, but all subsequent paste attempts paste ???? instead of Russian characters. English letters are unchanged. Because I am using Autohotkey_L Unicode 64-bit, there should not be any problems with Unicode to ANSI conversion. However, I believe that the assignment ClipBoard = %Clip0% actually performs conversion using the system code page, otherwise the problem would not occur.

I've checked ErrorLevel after the assignment line and it is 0. The code works similarly on my colleagues' computers with default English code page.

Does anyone have any idea what is happening? I would appreciate any help,

Thanks!

Stanislav

#2 stasok

stasok
  • Guests

Posted 15 June 2012 - 05:05 PM

Just a side note.

I have checked the clipboard contents with Freeclipviewer before and after using the above code. After the assignment of Clip0 to Clipboard variable, the number of clipboard formats (originally 20 after copying text in Word) drops down to 12 and the Unicode text format changes. HTML, Richtext, metafile and other clipboard formats are okay.

Regards,
Stanislav

#3 stasok

stasok
  • Guests

Posted 15 June 2012 - 07:51 PM

Hello again, everyone,

It seems I have corrected the problem using the Paste method of WinClip class (viewtopic.php?p=498667). However, I think Clipboard restoration in Autohotkey_L does not work as intended and should be changed to something similar as in WinClip class.

Best regards,
Stanislav

#4 Lexikos

Lexikos
  • Administrators
  • 8855 posts

Posted 16 June 2012 - 03:52 PM

Clipboard = %Clip0% does not perform any text conversions; it merely copies binary data back onto the clipboard. WinClip appears to work the same way. However, if I'm not mistaken, it discards the CF_TEXT and CF_OEMTEXT formats. ClipboardAll saves the first text format, which may be CF_TEXT, CF_OEMTEXT or CF_UNICODETEXT. It sounds like you're getting (non-Unicode) CF_TEXT, and in the process of saving and restoring it, the locale information is lost.

I'll look into this some more later.

#5 Lexikos

Lexikos
  • Administrators
  • 8855 posts

Posted 17 June 2012 - 01:15 AM

I have verified that AutoHotkey_L correctly saves and restore the original binary format of CF_TEXT, and the CF_LOCALE object which the OS uses to translate from CF_TEXT to CF_UNICODETEXT. Storing CF_TEXT with character value 128 and CF_LOCALE 1049 produces a Russian character, and saving and restoring the clipboard reproduces it correctly.

What I find odd is that the input language is used by default:

When you close the clipboard, if it contains CF_TEXT data but no CF_LOCALE data, the system automatically sets the CF_LOCALE format to the current input language.
Source: Standard Clipboard Formats

Normally, non-Unicode applications use the system default ANSI code page for all strings. While the input language can change at any time, the system code page remains constant and is the same for all applications until the OS restarts (at which time it can change). So I don't know under what circumstances ANSI text copied to the clipboard would actually be in the format defined by the input language rather than the system code page.

That aside, as far as I can tell, AutoHotkey does not affect the interpretation or binary value of CF_TEXT. However, if an application copies both CF_TEXT (first) and CF_UNICODETEXT (second), only the first format is kept. It seems likely that the CF_TEXT data would be in the system code page, which (if set to US English) could not contain Russian characters, regardless of the input language. You would be able to paste the text into Unicode-aware applications up until AutoHotkey discards the CF_UNICODETEXT data.

Now, I don't have a clue why any application would store both formats explicitly, since the system does automatic conversion. I would like to know:

[*:3vg17qd9]Which clipboard formats you observed before and after performing the assignment.
[*:3vg17qd9]The actual binary data/encoding of each text format on the clipboard.
[*:3vg17qd9]Where you are copying from and pasting to.

#6 stasok

stasok
  • Guests

Posted 17 June 2012 - 07:19 PM

Hello, Lexikos,

First, thanks for the excellent Autohotkey_L.

Here is what I did and what happened:

- I used Microsoft Word

- When I copy something to clipboard, the following clipboard formats appear on the clipboard:
Rich Text Format, HTML Format, Text (??????), Locale Identifier (09 04 00 00 binary - when current input language is English, 19 04 00 00 - Russian), Unicode Text Format (russian_text_6_letters), OEM Text (??????)
Other formats include: Ole Private Data, Hyperlink, HyperlinkWordBkmk, ObjectLink, Link Source Descriptor, Link Source, OwnerLink, Native, Embed Source, Object Descriptor, DataObject, Metafile Picture Format, Enhanced Metafile

When I run the code that stores clipboardall to a variable and then restores from that variable, I get this:
Rich Text Format (OK), HTML Format (OK), Text (??????), Locale Identifier (09 04 00 00 binary - when current input language is English, 19 04 00 00 - Russian), Unicode Text Format (??????), OEM Text (??????)
Also retained are DataObject, Object Descriptor, HyperlinkWordBkmk, Hyperlink, Ole Private Data. Other formats disappear.

Question marks show 6 Russian letters as they appear in Free Clipboard Viewer.

In other words, Unicode Text Format becomes equal to Text format after the assignment.

I tried running the code when the Input language was either Russian or English, but the result was the same. Only the Locale Identifier clipboard format changed as described above.

When I used Notepad, it copied Text (??????), Unicode Text Format (6 Russian letters), OEM Text (??????) and Locale Identifier (19 04 00 00 - for some reason different from above). After the assignment, the clipboard contents are the same: no clipboard corruption occurs.

However, when I integrated WinClip class, wc.Snap(data) and wc.Restore(data) store and restore clipboard data more accurately. I only lose OwnerLink format after the assignment, and Unicode Text Format contains the original text after restoring.

Thanks a lot,

Best regards,

Stanislav

#7 stasok

stasok
  • Guests

Posted 17 June 2012 - 07:22 PM

Lexikos, a small correction to my previous post:

In the case of Notepad, Locale Identifier contains 09 04 00 00 when current input language is English, 19 04 00 00 when input language is Russian, just like in Word.

Regards,
Stanislav

#8 Lexikos

Lexikos
  • Administrators
  • 8855 posts

Posted 17 June 2012 - 09:39 PM

Thanks, you've confirmed my suspicion. Word puts CF_TEXT before CF_UNICODETEXT, whereas Notepad doesn't. (Confirmed on my Windows 7 system with Word 2010 and a script calling EnumClipboardFormats.)

FYI, some detailed comments from the source code:

// EnumClipboardFormats() retrieves all formats, including synthesized formats that don't
// actually exist on the clipboard but are instead constructed on demand. Unfortunately,
// there doesn't appear to be any way to reliably determine which formats are real and
// which are synthesized (if there were such a way, a large memory savings could be
// realized by omitting the synthesized formats from the saved version). One thing that
// is certain is that the "real" format(s) come first and the synthesized ones afterward.
// However, that's not quite enough because although it is recommended that apps store
// the primary/preferred format first, the OS does not enforce this. For example, testing
// shows that the apps do not have to store CF_UNICODETEXT prior to storing CF_TEXT,
// in which case the clipboard might have inaccurate CF_TEXT as the first element and
// more accurate/complete (non-synthesized) CF_UNICODETEXT stored as the next.
// In spite of the above, the below seems likely to be accurate 99% or more of the time,
// which seems worth it given the large savings of memory that are achieved, especially
// for large quantities of text or large images. Confidence is further raised by the
// fact that MSDN says there's no advantage/reason for an app to place multiple formats
// onto the clipboard if those formats are available through synthesis.
// And since CF_TEXT always(?) yields synthetic CF_OEMTEXT and CF_UNICODETEXT, and
// probably (but less certainly) vice versa: if CF_TEXT is listed first, it might certainly
// mean that the other two do not need to be stored. There is some slight doubt about this
// in a situation where an app explicitly put CF_TEXT onto the clipboard and then followed
// it with CF_UNICODETEXT that isn't synthesized, nor does it match what would have been
// synthesized.
However, that seems extremely unlikely (it would be much more likely for
// an app to store CF_UNICODETEXT *first* followed by custom/non-synthesized CF_TEXT, but
// even that might be unheard of in practice). So for now -- since there is no documentation
// to be found about this anywhere -- it seems best to omit some of the most common
// synthesized formats:
// CF_TEXT is the first of three text formats to appear: Omit CF_OEMTEXT and CF_UNICODETEXT.
// (but not vice versa since those are less certain to be synthesized)
// (above avoids using four times the amount of memory that would otherwise be required)
// UPDATE: Only the first text format is included now, since MSDN says there is no
// advantage/reason to having multiple non-synthesized text formats on the clipboard.

Typical of Microsoft to not following their own recommendations - using multiple text formats and putting the least preferred one first.

I see two possible solutions:
[*:6xogzppp]Always store only CF_UNICODETEXT, assuming any applications requesting CF_TEXT will get a correctly-synthesized value.
[*:6xogzppp]Always store all text formats, despite the increased memory usage.

#9 stasok

stasok
  • Guests

Posted 18 June 2012 - 05:57 AM

Lexikos, thanks for your reply!

It does seem that Microsoft has problems in their handling of the clipboard. I'll stick with WinClip for now because it seems to keep the clipboard almost untouched.

Although my case is a bit rate, do you intend to change the way Autohotkey_L restores the clipboard? Maybe it's better to just put all clipboard formats back and not drop CF_UNICODETEXT?

Best regards,
Stanislav

#10 Lexikos

Lexikos
  • Administrators
  • 8855 posts

Posted 18 June 2012 - 07:30 AM

Nothing is dropped when restoring the clipboard. As I said earlier, ClipboardAll saves only the first text format (i.e. it depends on the order in which data was placed onto the clipboard). The next release will unconditionally save CF_UNICODETEXT.

#11 stasok

stasok
  • Guests

Posted 18 June 2012 - 04:00 PM

Great. Thanks for your time, Lexikos!