Cleaning Google Translate URLs (with RegEx) Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
sn1perwild
Posts: 70
Joined: 04 Aug 2021, 15:11

Cleaning Google Translate URLs (with RegEx)

27 Apr 2022, 14:02

Hello all!

O routinely deal with web pages translated by Google Translate. It has the habit of hijacking the page's links and appending Google Translate functions to them. I'm trying to create a script that will "clean" the selected URLs back to the originals and replace them.

For example, here are a couple of URLs I took from these pages:

Code: Select all

https://translate.google.com/website?sl=en&tl=pt&hl=pt-BR&client=webapp&u=https://www.google.com/url?q%3Dhttps://www.wired.com/gallery/best-office-chairs/%2523intcid%253D_wired-right-rail_99d755e0-3dd4-453d-8744-3ddec6243313_popular4-1-reranked-by-vidi%26sa%3DD%26source%3Deditors%26ust%3D1651087654369297%26usg%3DAOvVaw3CD9UQFJ7rq6uxGd800vVi

https://translate.google.com/website?sl=pt&tl=en&hl=pt-BR&client=webapp&u=https://www.google.com/url?q%3Dhttps://news.un.org/pt/story/2019/05/1671401%26sa%3DD%26source%3Deditors%26ust%3D1651086721941906%26usg%3DAOvVaw0uXAohDouhPpuo56vvX8SC

https://translate.google.com/website?sl=en&tl=pt&hl=pt-BR&client=webapp&u=https://www.google.com/url?q%3Dhttps://www.wired.com/story/elon-musk-buys-twitter-deal/%2523intcid%253D_wired-right-rail_99d755e0-3dd4-453d-8744-3ddec6243313_popular4-1-reranked-by-vidi%26sa%3DD%26source%3Deditors%26ust%3D1651087654366793%26usg%3DAOvVaw393tXu7_i5rylakkn8q0zk
I noticed a pattern of %26sa being at the end of every URL so I wrote the following regex:

Code: Select all

(?<=q%3D)(.*)(?=%26sa)
Finally, I wrote the following script while using the above regex:

Code: Select all

#T::
SaveVar=%ClipboardAll%
Clipboard=
Send ^c
ClipWait, 0.5
Clipboard := RegExMatch(Clipboard, "(?<=q%3D)(.*)(?=%26sa)")
Send ^v
Sleep 100
Clipboard=%SaveVar%
SaveVar=
return
I'm at a loss about what is wrong. Isn't Regexmatch supposed to output the match into the clipboard? I tested the regex using regex101's tester and it matches with the URLs correctly.

Can someone help me clear this up?

Thanks in advance!
Descolada
Posts: 1202
Joined: 23 Dec 2021, 02:30

Re: Cleaning Google Translate URLs (with RegEx)  Topic is solved

27 Apr 2022, 14:47

sn1perwild wrote:
27 Apr 2022, 14:02
I'm at a loss about what is wrong. Isn't Regexmatch supposed to output the match into the clipboard? I tested the regex using regex101's tester and it matches with the URLs correctly.
No, RegexMatch returns the position of the match. The match itself (along with a pseudoarray for all captured subpatterns) will be assigned to the third argument:

Code: Select all

RegExMatch(Clipboard, "(?<=q%3D)(.*)(?=%26sa)", match)
Clipboard := match
User avatar
sn1perwild
Posts: 70
Joined: 04 Aug 2021, 15:11

Re: Cleaning Google Translate URLs (with RegEx)

27 Apr 2022, 19:13

Descolada wrote:
27 Apr 2022, 14:47
sn1perwild wrote:
27 Apr 2022, 14:02
I'm at a loss about what is wrong. Isn't Regexmatch supposed to output the match into the clipboard? I tested the regex using regex101's tester and it matches with the URLs correctly.
No, RegexMatch returns the position of the match. The match itself (along with a pseudoarray for all captured subpatterns) will be assigned to the third argument:

Code: Select all

RegExMatch(Clipboard, "(?<=q%3D)(.*)(?=%26sa)", match)
Clipboard := match
Thank you so much, it worked! :)

Piggy-backing on this, I noticed that these URLs are encoded starting from the "url?" part (for instance, that %26sa is supposed to be a &). I had another URL that broke the pattern because it had a %25 in it, which decodes to a #. Is there a way to decode the entire URL before passing it to the RegexMatch?
AHKStudent
Posts: 1472
Joined: 05 May 2018, 12:23

Re: Cleaning Google Translate URLs (with RegEx)

27 Apr 2022, 22:52

sn1perwild wrote:
27 Apr 2022, 19:13
Descolada wrote:
27 Apr 2022, 14:47
sn1perwild wrote:
27 Apr 2022, 14:02
I'm at a loss about what is wrong. Isn't Regexmatch supposed to output the match into the clipboard? I tested the regex using regex101's tester and it matches with the URLs correctly.
No, RegexMatch returns the position of the match. The match itself (along with a pseudoarray for all captured subpatterns) will be assigned to the third argument:

Code: Select all

RegExMatch(Clipboard, "(?<=q%3D)(.*)(?=%26sa)", match)
Clipboard := match
Thank you so much, it worked! :)

Piggy-backing on this, I noticed that these URLs are encoded starting from the "url?" part (for instance, that %26sa is supposed to be a &). I had another URL that broke the pattern because it had a %25 in it, which decodes to a #. Is there a way to decode the entire URL before passing it to the RegexMatch?
viewtopic.php?t=84825 look at @teadrinker
User avatar
sn1perwild
Posts: 70
Joined: 04 Aug 2021, 15:11

Re: Cleaning Google Translate URLs (with RegEx)

28 Apr 2022, 07:18

AHKStudent wrote:
27 Apr 2022, 22:52
sn1perwild wrote:
27 Apr 2022, 19:13
Descolada wrote:
27 Apr 2022, 14:47
sn1perwild wrote:
27 Apr 2022, 14:02
I'm at a loss about what is wrong. Isn't Regexmatch supposed to output the match into the clipboard? I tested the regex using regex101's tester and it matches with the URLs correctly.
No, RegexMatch returns the position of the match. The match itself (along with a pseudoarray for all captured subpatterns) will be assigned to the third argument:

Code: Select all

RegExMatch(Clipboard, "(?<=q%3D)(.*)(?=%26sa)", match)
Clipboard := match
Thank you so much, it worked! :)

Piggy-backing on this, I noticed that these URLs are encoded starting from the "url?" part (for instance, that %26sa is supposed to be a &). I had another URL that broke the pattern because it had a %25 in it, which decodes to a #. Is there a way to decode the entire URL before passing it to the RegexMatch?
viewtopic.php?t=84825 look at @teadrinker
Thank you!
mike023
Posts: 1
Joined: 12 Dec 2022, 13:24

Re: Cleaning Google Translate URLs (with RegEx)

12 Dec 2022, 13:28

AHKStudent wrote:
27 Apr 2022, 22:52
sn1perwild wrote:
27 Apr 2022, 19:13
Descolada wrote:
27 Apr 2022, 14:47
sn1perwild wrote:
27 Apr 2022, 14:02
I'm at a loss about what is wrong. Isn't Regexmatch supposed to output the match into the clipboard? I tested the regex using regex101's tester and it matches with the URLs correctly.
No, RegexMatch returns the position of the match. The match itself (along with a pseudoarray for all captured subpatterns) will be assigned to the third argument:

Code: Select all

RegExMatch(Clipboard, "(?<=q%3D)(.*)(?=%26sa)", match)
Clipboard := match
MCS commercial cleaning and management services
Thank you so much, it worked! :)

Piggy-backing on this, I noticed that these URLs are encoded starting from the "url?" part (for instance, that %26sa is supposed to be a &). I had another URL that broke the pattern because it had a %25 in it, which decodes to a #. Is there a way to decode the entire URL before passing it to the RegexMatch?
viewtopic.php?t=84825 look at @teadrinker
Thanks for sharing

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Draken, oktavimark and 392 guests