I need to find out which specific special character is being copied into the clipboard

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

I need to find out which specific special character is being copied into the clipboard

27 Apr 2020, 02:38

Hi people:

I have a ridiculous problem I am not able to solve. I am a English-Spanish translator who is working online in a cloud translation application. I work in two columns: English and Spanish. The thing is that of my English source text I find "tags" (which they mean something, such a placeholder, or italics, or bold, or whatever).

Such as this
Sin título.png
Sin título.png (26.57 KiB) Viewed 1792 times
Now the difficulty.

If I press Control-C in the original English text, that "tag" (which is represented by a {1} but it's not really those characters) is copied as well. If I then paste that text in my Spanish translation, the tiny graphic tag appears in my Spanish text.

However, if I press Control C anywhere else (notepad, Notepad++, etc.) those "tags" are lost.

So I don't know what they are, whether they are an image, a special character that does not appear in Notepad. NO IDEA HOW THEY ARE KEPT IN THE CLIPBOARD!

And my problem is that I need to process the clipboard (particularly, replace some strings with RegexReplace and such) but, if I do this, these tags are destroyed. I would like to know how they are stored, so I could "rebuild" them after performing some string replacements.

But as I don't know how these tags are stored in the CLipboard, I cannot reconstruct them. So with any RegexReplace or StrReplace operation, those tags are destroyed.

Any ideas???
Nixcalo

Update:
When I paste in Word, a weird "blank image" appears. I don't really know what it is, as it seems a kind of blank image but does not let you do anything with it except change its size (so I don't really know if it's an image). The thing is that if I copy the text from the browser (where the cloud translation app is) to Word, and then back to the browser, the tag is somewhat lost in the process. So yes, I don't know what the heck is stored in the clipboard and how to reproduce it via SendInput....
list
Posts: 222
Joined: 26 Mar 2014, 14:03
Contact:

Re: I need to find out which specific special character is being copied into the clipboard

27 Apr 2020, 03:26

Let's assume it is HTML you are copying to the clipboard, so when you modify the content it becomes text and all formatting is lost. The trick would be to get the HTML code "as text" into a variable, modify that variable, and set that variable as HTML in the clipboard. For that you need WinClip() - see method four here https://www.autohotkey.com/boards/viewtopic.php?f=7&t=8977 (download the winclip files first)

Here there is an example of how to get the HTML code into a variable which you can modify ClipData:=GetHTML(WinClip.GetHTML()) so instead of RegExReplace on the clipboard do it with ClipData (in this example).

Get/Set HTML into the clipboard
https://www.autohotkey.com/boards/viewtopic.php?f=76&t=73899&p=322154#p322154
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Re: I need to find out which specific special character is being copied into the clipboard

06 May 2020, 21:57

It does not seem to be HTML. The only thing I know is that, when I paste in Word, a white box appears where every tag is supposed to be. When I select from Word back into the browser, the tags are NOT replicated.

And if I copy from the browser to any application other than Word (including Excel), the white box does not appear.

So I am at a loss.
list
Posts: 222
Joined: 26 Mar 2014, 14:03
Contact:

Re: I need to find out which specific special character is being copied into the clipboard

07 May 2020, 11:44

Have you actually tried the GetHTML and WinClip? Pretty sure it is html because you say it is online - copying to word won't help to determine it. If you can't be bothered to try gethtml/winclip yet do have a peek in the clipboard using a freeware portable clipboard viewer https://www.nirsoft.net/utils/inside_clipboard.html no doubt it will show "html format" as an option after you copied something. If it does you know you can use gethtml/winclip.

But you are the only one that can help you by trying :)
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Re: I need to find out which specific special character is being copied into the clipboard

11 Jun 2020, 01:17

I am at a loss on what to do. I have reading these posts for several days, and I am able to recover the text and extract it, but cannot modify it and then place it back while respecting the tags in the same position.

In other words, I want to go from this
Captura.JPG
Captura.JPG (8.69 KiB) Viewed 1501 times
to this
captura 2.JPG
captura 2.JPG (10.87 KiB) Viewed 1501 times
And I have no idea. The numbers 1 and 2 are Html tags, and if I perform any kind change to the text, they are destroyed. How an I use Winclip for this??

I know how to change to uppercase with the Format command, or the StringUpper, but if I apply them I destroy the tags.
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Re: I need to find out which specific special character is being copied into the clipboard

11 Jun 2020, 11:40

Hi Bobo,

I don't understand. I work on an Translation tool that works on an online browser. I don't need Word so I this article does not help. As I said, what I need is simple, what is not simple (for me!) is solving it.

I have a string that comprises some plain text and some embedded tags, which one of you helped me discover were some HTML tags. The thing is, I can get the text in the string, but if I apply some operation to it (like changing all letters to lowercase, etc.) the HTML tags are destroyed and thus they are lost.

And I wonder if there is a way to work around this. I don't know whether the solution would be to split the string in text sentences and HTML tags, apply the functions I need to the text and then reconstruct the whole text+tag+text+tag string.

The problem with this is severalfold. I don't know how many tags there are (from zero to several), or where they are placed. There might be two consecutive tags, or tags with plain text in the middle.
And I don't know how I can start to analyze a ClipBoard with HTML so I can create variables with text, and variables with HTML, in order to concatenate them.

ANd of course there is always the possibility that the solution is simpler.

The thing is that I have been reading all WinClip documentation, and examples, but it's not clear to me if I can extract the plain text from the HTML-based ClipBoard, apply some operation to it and then put it back keeping the tags in the same place. Actually, I am not sure what WinClip is for, as extracting the plain text from an HTML ClipBoard can be done just by other means (like using SubStr() with the whole ClipBoard).

I'll be more specific with the exact example.
For the segment
Captura.JPG
Captura.JPG (8.69 KiB) Viewed 1464 times
this is the analysis of the clipboard performed by the free utility InsideClipboard 1.15 (which is one of the recommended utilities to check whether the tags were HTML or not, as one of you suggested).

CF_UNICODETEXT

Code: Select all

Gire la tapa de presión 1 hacia la izquierda para aflojarla.
CF_TEXT

Code: Select all

00000000   47 69 72 65 20 6C 61 20 74 61 70 61 20 64 65 20    Gire la tapa de 
00000010   70 72 65 73 69 F3 6E 20 31 20 68 61 63 69 61 20    presión 1 hacia 
00000020   6C 61 20 69 7A 71 75 69 65 72 64 61 20 70 61 72    la izquierda par
00000030   61 20 61 66 6C 6F 6A 61 72 6C 61 2E 00             a aflojarla..
and the HTML Format (which must be the important thing here) which is

Code: Select all

Version:0.9
StartHTML:0000000196
EndHTML:0000003024
StartFragment:0000000232
EndFragment:0000002988
SourceURL:https://caterpillar.xtm-intl.com/workbench/?_s=5b8a3f4be2ea459eb1f38387f7755970
<html>
<body>
<!--StartFragment--><span class="text-node" style="box-sizing: border-box; white-space: pre-wrap; color: rgb(68, 68, 68); font-family: Roboto; font-size: 15px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Gire la tapa de presión </span><img class="inline" src="https://caterpillar.xtm-intl.com/workbench/web/inline-image?size=15&amp;type=G&amp;id=1" data-inline-id="1" data-inline-type="G" style="box-sizing: border-box; border-style: none; vertical-align: middle; margin-top: -4px; color: rgb(68, 68, 68); font-family: Roboto; font-size: 15px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"><span class="text-node" style="box-sizing: border-box; white-space: pre-wrap; color: rgb(68, 68, 68); font-family: Roboto; font-size: 15px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1</span><img class="inline" src="https://caterpillar.xtm-intl.com/workbench/web/inline-image?size=15&amp;type=G&amp;id=1" data-inline-id="1" data-inline-type="G" style="box-sizing: border-box; border-style: none; vertical-align: middle; margin-top: -4px; color: rgb(68, 68, 68); font-family: Roboto; font-size: 15px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"><span class="text-node" style="box-sizing: border-box; white-space: pre-wrap; color: rgb(68, 68, 68); font-family: Roboto; font-size: 15px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> hacia la izquierda para aflojarla.</span><!--EndFragment-->
</body>
</html>
No doubt, the important part in here lies in the third last line... but how the heck can I extract and modify the text in there (near the end of the line), for example, changing it to uppercase, pluck the new capitalized text in the same position of the HTML Format and then put it back in the browser so what i get is this?
captura 2.JPG
captura 2.JPG (10.87 KiB) Viewed 1464 times
I had the vain hope that, if I copied the whole string with the HTML with a simple Control+C and pressed Control-V in the browser, the inline tags would magically appear, but no, what appears is the whole huge HTML string as raw ASCII. In other words, this.
captura 3.JPG
captura 3.JPG (33.77 KiB) Viewed 1464 times
And that is not what I want, of course (note I have not bothered to capitalize the string I want)

Thank you for your help. I hope things are clearer now. Please note that there might be two tags, as in the previous example, or six, or none.

How can Winclip help here?
list
Posts: 222
Joined: 26 Mar 2014, 14:03
Contact:

Re: I need to find out which specific special character is being copied into the clipboard

11 Jun 2020, 15:07

1. Download WinClip.zip from here https://www.autohotkey.com/boards/viewtopic.php?f=7&t=8977
2. Unpack the files in a folder, place the script below in the same folder so the #include works
3. Select some text, copy it to the clipboard. Now press F11, this will get the HTML code from the clipboard as text into a variable called ClipData, now you can change the text using StrReplace, RegExReplace and other AHK functions and commands incl. lower, uppercase (RegEx also has Upper and Lower "commands)
4. To paste the HTML back press F12, this will set the ClipData text as the special "HTML Format" into the clipboard which allows pasting with formatting.

Hope it works. (the code below is the same as what I linked to btw)

Code: Select all

#NoEnv
#SingleInstance, force

F11:: ; store html as text in ClipData variable
If !WinClip.HasFormat(49351) ; no html present in clipboard
	{
	 SoundPlay, *48
	 Return
	}
ClipData:=GetHTML(WinClip.GetHTML())
ClipData:=StrReplace(ClipData,"e","_") ; replace all e with _ just to illustrate
MsgBox % ClipData ; will show you HTML code 
Return

F12:: ; restore ClipData as HTML into the clipboard
ClipSave:=ClipboardAll
Clipboard:=""
WinClip.Clear()
WinClip.SetHTML(ClipData)
Sleep 200
Send ^v
Sleep 200
Clipboard:=ClipSave
ClipSave:=""
Return

GetHTML(in)
	{
	 in:=RegExReplace(in,"iUs)^.*<htm","<htm")
	 in:=StrReplace(in,"<!--StartFragment-->")
	 in:=StrReplace(in,"<!--EndFragment-->")

	 ; if have problems with Ã, è try to uncomment these lines below 
	 ; we need to go from UTF-8 bytes to Unicode text


	 ;clipsize := StrPut(in, "CP0")
	 ;VarSetCapacity(cliptemp, clipsize)
	 ;StrPut(in, &cliptemp, "CP0")
	 ;return StrGet(&cliptemp, "UTF-8")

	 return in
	}

#Include WinClipAPI.ahk ;include this first
#Include WinClip.ahk
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Re: I need to find out which specific special character is being copied into the clipboard

11 Jun 2020, 15:17

I have been doing a lot of research and the code you posted does not work for me. However, the free ClipBoard utilities told me that perhaps the command

Code: Select all

If !WinClip.HasFormat(49351)
is not correct.

The one that is correct (for me, at least) is this one

Code: Select all

If !WinClip.HasFormat(49418)
I have no idea why.

And then it works! However, I have now another problem, which perhaps might now merit its own post. The thing is that I am Spanish and I work with Unicode. The thing is that the HTML destroys things like tildes or strange symbols. For example, instead of "presión", it displays "presión" which is no good for me.

But apparently the whole problem was the WinClip.HasFormat function. I am now struggling to convert HTML to Unicode so I get back tildes and "ñ" and such letters... Or, as we say, we get out of the frying pan to fall into the fire. :headwall:

P.S. By the way, your replacement does not replace only the PlainText, it replaces all e's in the HTML structure so it's basically useless after that, but I understand that what you mean is that, with a little careful tweaking, you can find the actual plain text within the HTML codes and modify just that.
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: I need to find out which specific special character is being copied into the clipboard

11 Jun 2020, 15:40

it destroys ur tildes because the library is not implemented to spec
msdn wrote:The only character set supported by the clipboard is Unicode in its UTF-8 encoding.
and here(and possibly in other places) it converts the text to an ansi string

Code: Select all

  GetHtml()
  {
    if !( clipSize := this._fromclipboard( clipData ) )
      return ""
    if !( out_size := this._getFormatData( out_data, clipData, clipSize, "HTML Format" ) )
      return ""
    return strget( &out_data, out_size, "CP0" )
  }
  
  iGetHtml()
  {
    this._IsInstance( A_ThisFunc )
    if !( clipSize := this._getClipData( clipData ) )
      return ""
    if !( out_size := this._getFormatData( out_data, clipData, clipSize, "HTML Format" ) )
      return ""
    return strget( &out_data, out_size, "CP0" )
  }
the encoding should have been UTF-8
one that is correct (for me, at least)
correct today, incorrect tomorrow(or whenever u end up restarting ur pc)
clients should call RegisterClipboardFormat(or whatever winclip's equivalent of that is) to obtain the actual format id(when used with an already registered format)
Nixcalo
Posts: 116
Joined: 06 Feb 2018, 04:24

Re: I need to find out which specific special character is being copied into the clipboard

11 Jun 2020, 16:18

list wrote:
11 Jun 2020, 16:00
Re ó I did include commented code, did you see it?
No I had not. It works! You are awesomeeeee!!!

Apparently I had seen some comments and discarded them. Damn! It's sometimes shameful when you are completely lost and someone helps you and makes you wonder how you could be so dumb (or how can there be so many smart and helpful people out there!)

Thanks everybody! You are great. I know have to collect all the information I have in the last week but I think I am on the right track thanks to you guys.

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: No registered users and 168 guests