Jump to content


Photo

Auto attach its URL when copy from a webpage


  • Please log in to reply
10 replies to this topic

#1 amnesiac

amnesiac
  • Members
  • 124 posts

Posted 08 July 2012 - 06:33 AM

When I copy something from a webpage, I often copy and paste twice - one for its contents and one for its source. The script simplifys this operation.
It is independent with browsers. It also can work when copy something from CHM or other webpages.

Note: A perfect script for the same purpose is this Lexikos's script. Strongly recommended!

Requirement: Bin2Hex() (Thank PhiLho!)
$^v:: ; Use hook to avoid itself
Send, ^v
binClipData := ClipboardAll
Bin2Hex(hexClipData, binClipData) ; Covert raw data to a string of HEX
if (SubStr(hexClipData, 1, 8) <> "09C00000") ; Determine whether the Clipboard data come from a webpage
  return
iFoundPos := RegExMatch(hexClipData, "0D0A536F7572636555524C3A(.+?)0D0A", Match) ;~ 0D0A536F7572636555524C3A ~ 0D0A (SourceURL:)
if (Match1 = "")
  return
hexSource := Match1
sSource := ""
Loop    ; Covert a string of HEX to raw data
{
  sSource .= Chr("0x" SubStr(hexSource, A_Index * 2 - 1, 2))
  if (A_Index * 2 > StrLen(hexSource))
    break
}
;~ SendInput, `nSource: %sSource% ; It is better to Use clipboard.
Clipboard := "`nSource: " sSource
Sleep, 0
Send, ^v
Clipboard := binClipData ; Recover the clipboard.
return

;~ Source URL: http://www.autohotkey.com/community/viewtopic.php?t=7549
/*
// Convert raw bytes stored in a variable to a string of hexa digit pairs.
// Convert either byteNb bytes or, if null, the whole content of the variable.
//
// Return the number of converted bytes, or -1 if error (memory allocation)
*/
Bin2Hex(ByRef @hex, ByRef @bin, _byteNb=0)
{
  local intFormat, dataSize, dataAddress, granted, x

  ; Save original integer format
  intFormat = %A_FormatInteger%
  ; For converting bytes to hex
  SetFormat Integer, Hex

  ; Get size of data
  dataSize := VarSetCapacity(@bin)
  If (_byteNb < 1 or _byteNb > dataSize)
  {
    _byteNb := dataSize
  }
  dataAddress := &@bin
  ; Make enough room (faster)
  granted := VarSetCapacity(@hex, _byteNb * 2)
  if (granted < _byteNb * 2)
  {
  ; Cannot allocate enough memory
    ErrorLevel = Mem=%granted%
    Return -1
  }
  Loop %_byteNb%
  {
    ; Get byte in hexa
    x := *dataAddress + 0x100
    StringRight x, x, 2   ; 2 hex digits
    StringUpper x, x
    @hex = %@hex%%x%
    dataAddress++   ; Next byte
  }
  ; Restore original integer format
  SetFormat Integer, %intFormat%

  Return _byteNb
}
16-07-2012 Edit: Add the link to Lexikos's script.

#2 dylan904

dylan904
  • Members
  • 706 posts

Posted 09 July 2012 - 09:35 PM

Very nice for cross-browser! If you enjoy the simplicity of Google Chrome, you could use a small script like this...
SetTitleMatchMode, 2

$^c::

Send, ^c

IfWinActive, Google Chrome

{

   HeldURL := True 

   ControlGetText, Text, Chrome_OmniboxView1, A

}

Else

   HeldURL := False

Return



$^v::

If (HeldURL)

{

   Clipboard := Text . ": " . Clipboard

   HeldURL := False

   SendInput, ^v

}

Else 

   SendInput, ^v

Return


#3 amnesiac

amnesiac
  • Members
  • 124 posts

Posted 10 July 2012 - 12:51 AM

Very nice for cross-browser! If you enjoy the simplicity of Google Chrome, you could use a small script like this...

Thank you. But if so, I prefer the Context-sensitive Hotkeys:
SetTitleMatchMode, 2

#IfWinActive Google Chrome ; Here I would like to use its class name than its title.
$^c::
Send, ^c
HeldURL := True
ControlGetText, Text, Chrome_OmniboxView1, A
Return

#If HeldURL
$^v::
Clipboard := Text . ": " . Clipboard ; Here the format of object is losing.
HeldURL := False
SendInput, ^v
Return
Sorry. I don't use Chrome now and I can't test it.

#4 Lexikos

Lexikos
  • Administrators
  • 8845 posts

Posted 12 July 2012 - 05:20 AM

The code in your first post is compatible with AutoHotkey 1.0. Do you wish to keep this thread in the Custom forum regardless?

Actually, it appears Bin2Hex was written for an old version:
[*:3j64re7n]VarSetCapacity should be doubled for Unicode, though it probably won't affect performance much anyway.
[*:3j64re7n]It uses the obscure *deref operator instead of NumGet.
[*:3j64re7n]It uses SetFormat Integer, which disables binary number write-caching. SetFormat IntegerFast should be used instead.
The script checks for format 0xC009, which appears to be "DataObject". However, I don't think it's guaranteed by any official documentation to have that value. Furthermore, that's actually not the clipboard format you're retrieving data from; it just happens to be listed first.

You're actually retrieving the SourceURL from the "HTML Format" data object. The numerical code for this format can be determined as follows:
CF_HTML := DllCall("RegisterClipboardFormat", "str", "HTML Format")
The following demonstrates how to loop through the formats in ClipboardAll, and also serves as a way to inspect what formats are on the clipboard:
~^c::
Sleep 250
bin := ClipboardAll
n := 0
s := ""
VarSetCapacity(format_name, 200)
; For each format...
while format := NumGet(bin, n, "uint")
{
    if s !=
        s .= "`n`n"
    ; Get name of registered format.
    if (format >= 0xC000 && format <= 0xFFFF)
        if DllCall("GetClipboardFormatName", "uint", format, "str", format_name, "int", 100)
            format := format_name
    s .= "*** format=" format "`n"
    size := NumGet(bin, n+4, "uint")
    ; Retrieve all ASCII characters (excluding control characters).
    Loop % size {
        byte := NumGet(bin, n+7+A_Index, "uchar")
        if (byte >= 32 && byte <= 127 || byte = 10 || byte = 13)
            s .= Chr(byte)
    }
    ; Advance to next stored format.
    n += 8 + size
}
MsgBox %s%
return
Alternatively, you can retrieve the HTML Format data directly using SKAN's ClipboardGet_HTML(), then use RegExMatch to get the SourceURL.


... or just continue doing whatever works for you.

#5 amnesiac

amnesiac
  • Members
  • 124 posts

Posted 12 July 2012 - 11:41 AM

The code in your first post is compatible with AutoHotkey 1.0. Do you wish to keep this thread in the Custom forum regardless?

Since I use AutoHotkey_L all the time and I didn't test it with AutoHotkey Basic, it was reliable to place it. If convenient, please help me to move it to a appropriate place now.

Actually, it appears Bin2Hex was written for an old version...

I have written the following functions to replace Bin2Hex() and Hex2Bin(), but they may have some problems. Can you help me to improve them?
/*
http://www.autohotkey.com/community/viewtopic.php?f=2&t=88529&p=549144#p550352
Raw2Hex() converts raw data to HEX string.
  raw The var which contains the raw data.
  hex The var which will save converted HEX string.
  bytes The bytes to be converted.
Return the bytes converted.
*/
Raw2Hex(ByRef raw, ByRef hex, bytes=0)
{
  iIntFormat := A_FormatInteger
  SetFormat, IntegerFast, hex
  
  iCapacity := VarSetCapacity(raw)
  if (bytes < 1 or bytes > iCapacity)
    bytes := iCapacity

  VarSetCapacity(hex, bytes * 2, 0)
  Loop, % bytes
  {
    iByte := NumGet(raw, A_Index - 1, "UChar")
    sByte := SubStr(iByte, 3)
    hex .= (StrLen(sByte) = 1) ? "0" sByte : sByte
  }

  SetFormat, IntegerFast, %iIntFormat%
  return, bytes
}

/*
Hex2Raw() converts HEX string to raw data.
  hex HEX string to be converted.
  raw The var which will save raw data.
Return the bytes converted.
*/
Hex2Raw(hex, ByRef raw)
{
  VarSetCapacity(raw, StrLen(hex) / 2, 0)
  Loop, % StrLen(hex) / 2
  {
    iStartChars :=  "0x" SubStr(hex, A_Index * 2 - 1, 2)
    NumPut(iStartChars, raw, A_Index - 1, "UChar")
    iBytes := A_Index
  }
  return, iBytes
}

The script checks for format 0xC009...

Thanks to point it.
It was come due to my poor way. I see that the help say "A saved clipboard file internally consists of a four-byte format type, followed by a four-byte data-block size, followed by the data-block for that format. ", and then I found that the beginning four bytes of ClipboardAll is "09C00000" when I copied something from a webpage.

What should I do if ClipboardAll has many "HTML Object" (so many sources) at one time?

16-07-2012 Edit: Adjust the order of raw and hex parameter for Raw2Hex() and Hex2Raw().
14-07-2012 Fixed: Change the data type for NumPut() and NumGet() to "UChar" to avoid the problem of non-ASCII chars.
13-07-2012 Edit: Change Raw2Hex()'s raw parameter to ByRef type. Thanks for Lexikos's tip.

#6 Lexikos

Lexikos
  • Administrators
  • 8845 posts

Posted 12 July 2012 - 09:24 PM

The raw parameter in Raw2Hex should be ByRef to properly support binary data.

The clipboard can only contain one instance of each format at a time.

I have moved the topic.

#7 amnesiac

amnesiac
  • Members
  • 124 posts

Posted 13 July 2012 - 03:00 AM

Thank you very much. I update the function.
I also improve the script in the first post. Please help me to inspect it:
$^v::
Send, ^v
bin := ClipboardAll
n := 0
s := ""
VarSetCapacity(format_name, 200)
while format := NumGet(bin, n, "uint")
{
  size := NumGet(bin, n+4, "uint")
  ; Get name of registered format.
  if (format >= 0xC000 && format <= 0xFFFF) and DllCall("GetClipboardFormatName", "uint", format, "str", format_name, "int", 100) and (format_name = "HTML Format")
  {
    Loop % size
    {
      byte := NumGet(bin, n+7+A_Index, "uchar") ; Here may be a problem if the clipboard includes some Unicode chars or URL-encode (e.g. %20).
      if (byte >= 32 && byte <= 127 || byte = 10 || byte = 13)
        s .= Chr(byte)
    }
    RegExMatch(s, "SourceURL:(?P<Source>.*)", Match)
    break
  }

  ; Advance to next stored format.
  n += 8 + size
}
if !MatchSource
  return

Clipboard := "`nSource: " MatchSource
Sleep, 250
Send, ^v
Clipboard := bin
return
Is it only suitable for AutoHotkey_L?

#8 Lexikos

Lexikos
  • Administrators
  • 8845 posts

Posted 13 July 2012 - 08:58 AM

The code I wrote was not intended to be used directly for your purpose.
[*:2y15hyzx]Rather than retrieving each format name, you should call RegisterClipboardFormat (as previously shown) once to retrieve the numeric format code, and compare that to format within the loop. (On the other hand, retrieving the name might be more convenient if you're looking for one of several formats.)
[*:2y15hyzx]Rather than manually retrieving the ASCII characters contained within the data (which was only done to show readable text within the other formats), use StrGet(&bin+n+8, size, "UTF-8") to retrieve the string. Although it's probably only going to contain ASCII characters, MSDN states that the HTML clipboard format uses UTF-8.
[*:2y15hyzx]My previous code is compatible with AutoHotkey 1.0 and AutoHotkey_L.
~^v::
Sleep 100
CF_HTML := DllCall("RegisterClipboardFormat", "str", "HTML Format")
bin := ClipboardAll
n := 0
while format := NumGet(bin, n, "uint")
{
    size := NumGet(bin, n + 4, "uint")
    if (format = CF_HTML)
    {
        html := StrGet(&bin + n + 8, size, "UTF-8")
        RegExMatch(html, "(*ANYCRLF)SourceURL:\K.*", sourceURL)
        break
    }
    n += 8 + size
}
if !sourceURL
    return
Clipboard := "`nSource: " sourceURL
Send ^v
Sleep 250
Clipboard := bin
return
More notes:
[*:2y15hyzx]This code is compatible with both AutoHotkey versions, but AutoHotkey 1.0 users need StrGet.ahk.
[*:2y15hyzx]Instead of blocking Ctrl+V and then sending it, I let it through and give the target window some time to handle it.
[*:2y15hyzx]The header lines in the HTML Format may be delimited by `r, `n or `r`n, hence (*ANYCRLF).
[*:2y15hyzx]There should never be any need to sleep between Clipboard := and Send ^v, since the assignment does not return until the data has been stored on the clipboard. However, there is sometimes a need to sleep after Send ^v to let the target window paste before you reset the Clipboard.
[*:2y15hyzx]IE and Firefox place an additional text format on the clipboard, containing just the source URL. However, this only covers two browsers, and the name of the format is browser-specific, so extracting the source URL from the HTML data is a better option.

#9 amnesiac

amnesiac
  • Members
  • 124 posts

Posted 13 July 2012 - 09:49 AM

It is perfect and the explanation is very comprehensive. I am grateful.

The raw parameter in Raw2Hex should be ByRef to properly support binary data.

But I see the help say:

In v1.0.46+, binary-clipboard variables may be passed to functions by value (formerly they only worked ByRef).

Do you think about the compatibility with older versions?

#10 Lexikos

Lexikos
  • Administrators
  • 8845 posts

Posted 13 July 2012 - 11:53 AM

Binary clipboard data is the exception to the rule. I had forgotten it, but even so, what I said is accurate. If the function was named BinaryClip2Hex rather than Raw2Hex, one might not expect it to work correctly with any "raw" data.

Using ByRef also avoids unnecessarily copying the data.

#11 amnesiac

amnesiac
  • Members
  • 124 posts

Posted 13 July 2012 - 11:31 PM

When I use your last script, I find that the source is mk:@MSITStore:c:\Program%20Files\AutoHotkey\AutoHotkey.chm::/docs/AutoHotkey.htm when I copy something from the AutoHotkey's help in the default dir. Should I do a futher processing (i.e. replace %xx hexadecimal form with their ASCII character set equivalents)?

Another question, is there a simple way to merge two binary-clipboard variables? I wish that the result var includes all contents of them and uses one of format (if they are not the same).

PS: About Raw2Hex() and Hex2Raw(), I change the data type for NumPut() and NumGet() to "UChar" to avoid the problem of non-ASCII chars.