[SOLVED] How can a unicode ligature character be found and replaced with a script? Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Klarion
Posts: 176
Joined: 26 Mar 2019, 10:02

Re: How can a unicode ligature character be found and replaced with a script?

18 Apr 2019, 05:08

.
Last edited by Klarion on 20 Apr 2019, 00:50, edited 1 time in total.
Klarion
Posts: 176
Joined: 26 Mar 2019, 10:02

Re: How can a unicode ligature character be found and replaced with a script?

18 Apr 2019, 05:14

.
Last edited by Klarion on 20 Apr 2019, 00:50, edited 1 time in total.
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: How can a unicode ligature character be found and replaced with a script?

18 Apr 2019, 06:45

Thanks for the help.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
just me
Posts: 9458
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: How can a unicode ligature character be found and replaced with a script?

18 Apr 2019, 07:43

In case you run into such a problem once again:

Copy the character you want to replace into an AHK Unicode script and use a simple StrReplace():

Code: Select all

#NoEnv
; example: "fine"
SearchText := "fi"
ReplaceText := "fi"

Original =
(
Soon after receiving the instructions to advance, Taylor had
given notice of his orders to influential citizens of Matamoros
then at Corpus Christi, explaining that his march would be
entirely pacific, and that he expected the pending questions to
be settled by negotiation; and similar assurances were con-
veyed to the Mexican customhouse office at “ Brazos Santiago,”
near Point Isabel‘ March 8 a more formal announcement
appeared in General Orders No. 30. Taylor here expressed
the hope that his movement would be “beneficial to all con-
cerned,” insisted upon a scrupulous regard for the civil and
religious rights of the people, and commanded that everything
required for the use of the army should be paid for “at the
highest market price." These orders, which merely antici-
pated instructions then on their way from Washington, were
translated into Spanish, and placed in circulation along the
border.<ref>18</ref>
)

Replaced := StrReplace(Original, SearchText, ReplaceText, Counter)

MsgBox, 0, Replaced %Counter% occurrences of %SearchText%, %Replaced%
Klarion
Posts: 176
Joined: 26 Mar 2019, 10:02

Re: How can a unicode ligature character be found and replaced with a script?

18 Apr 2019, 08:04

.
Last edited by Klarion on 20 Apr 2019, 00:54, edited 2 times in total.
Klarion
Posts: 176
Joined: 26 Mar 2019, 10:02

Re: How can a unicode ligature character be found and replaced with a script?

18 Apr 2019, 08:16

.
Last edited by Klarion on 20 Apr 2019, 00:49, edited 1 time in total.
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: How can a unicode ligature character be found and replaced with a script?

18 Apr 2019, 19:38

@just me Thanks for the advice. Please check my opening post in which my script is displayed. This is exactly what I did, and it doesn't work.

Perhaps it's my global settings at the top of the page that cause the problem, so I am pasting it here:

Code: Select all

#SingleInstance force

#NoEnv

#Warn

SetWorkingDir, D:\ahk-functions-library

~LWin::vk07
~RWin::Return
~AppsKey::Return

#Include D:\ahk-functions-library\eng-pr.ahk
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: How can a unicode ligature character be found and replaced with a script?

18 Apr 2019, 22:03

Klarion wrote:
18 Apr 2019, 08:16
@ineuw
Feels like you are the people who hates tips
So, I'm giving you the straightforward ANSWER
.= is an operator
and
~= is an operator, again
Regards
Much thanks again. Contrary to what you think, I have been analyzing your code step by step and looking up everything in the AHK Help. I am slow because my programming interest is focused on what I need for proofreading. I understood your concept but the output is empty. Understood the concatenation and the filter ranges filtering the output, but nothing seems to be passing to the function because the (x) is empty.

Here is my latest effort with msgboxes placed to follow the values.

Code: Select all

;test.ahk

;alt-u                        ligature "fi" or - chr(0xEF)chr(0xAC)chr(0x81) or {U+FB01}
!u::
	autotrim, on
	clipboard =
	dirtyText =
	cleanResult =

	sendinput, ^a^c
	clipwait, 0
	dirtyText = %clipboard%

	Msgbox, % dirtyText			;INPUT OK

	clipwait, 0

	Loop, Parse, % dirtyText

		;msgbox, "loopfield",  %A_LoopField% ; THIS IS OK

		cleanResult . = A_LoopField ~ = "[\x{0000}-\x{007F}]" ? A_LoopField : NormalizationFormKC(A_LoopField)

		msgbox, "cleanresult", %cleanResult%	;NOTHING

	sendinput, %cleanResult%

return

NormalizationFormKC(x)

	msgbox, 1, "x_value", %x%

{
	a := StrLen(x) * 6
	VarSetCapacity(b, a)
	DllCall("Normaliz.dll\NormalizeString", "int", 5, "wstr", x, "int", StrLen(x), "ptr", &b, "int", a)
	Return StrGet(&b, a, "UTF-16")
}
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
ahk7
Posts: 575
Joined: 06 Nov 2013, 16:35

Re: How can a unicode ligature character be found and replaced with a script?  Topic is solved

19 Apr 2019, 03:16

Using MsgBox to debug is useful and you have pin pointed the problematic line.
cleanResult . = A_LoopField ~ = "[\x{0000}-\x{007F}]" ? A_LoopField : NormalizationFormKC(A_LoopField)
You have a syntax error there, you use . = and ~ = which won't work. You need to remove the space, so
.= and ~=

So change that line and it should work (or at least that code should produce a result, if it is the desired result might be another question)
just me
Posts: 9458
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: How can a unicode ligature character be found and replaced with a script?

19 Apr 2019, 05:50

NormalizeString() should work with the whole string at once:

Code: Select all

#NoEnv

DirtyText =
(
Soon after receiving the instructions to advance, Taylor had
given notice of his orders to influential citizens of Matamoros
then at Corpus Christi, explaining that his march would be
entirely pacific, and that he expected the pending questions to
be settled by negotiation; and similar assurances were con-
veyed to the Mexican customhouse office at “ Brazos Santiago,”
near Point Isabel‘ March 8 a more formal announcement
appeared in General Orders No. 30. Taylor here expressed
the hope that his movement would be “beneficial to all con-
cerned,” insisted upon a scrupulous regard for the civil and
religious rights of the people, and commanded that everything
required for the use of the army should be paid for “at the
highest market price." These orders, which merely antici-
pated instructions then on their way from Washington, were
translated into Spanish, and placed in circulation along the
border.<ref>18</ref>
)

CleanResult := NormalizeFormKC(DirtyText)
MsgBox, 0, CleanResult, %CleanResult%

ExitApp

NormalizeFormKC(Str)
{
	; docs.microsoft.com/en-us/windows/desktop/api/winnls/nf-winnls-normalizestring
	; MsgBox, 1, Str Value, %Str%
	Length := DllCall("Normaliz.dll\NormalizeString", "Int", 5, "WStr", Str, "Int", -1, "Ptr", 0, "Int", 0, "Int")
	If (Length > 0)
	{
		VarSetCapacity(Out, Length * 2, 0)
		DllCall("Normaliz.dll\NormalizeString", "Int", 5, "WStr", Str, "Int", -1, "Ptr", &Out, "Int", Length, "Int")
		Return StrGet(&Out, "UTF-16")
	}
}
ineuw wrote:Thanks for the advice. Please check my opening post in which my script is displayed. This is exactly what I did, and it doesn't work.
Do you run AHK Unicode? I copied the text from your post and StrReplace() is working for me. The character code is 0xFB01 for UTF-16, 0xEFAC81 is UTF-8.

@Klarion:
We, usually, call that kind of programming style as ReInventing the Wheels
It's just an option for people like you not willing to learn about C/C++ function calls. ;)
Klarion
Posts: 176
Joined: 26 Mar 2019, 10:02

Re: [SOLVED] How can a unicode ligature character be found and replaced with a script?

20 Apr 2019, 00:55

@ineuw
very nice

I am asking you "kindlly" to delete all of my codes from your computer and your head and do not use it ever
I do not want you to use my code

If you want to do something, write your own code


Regards
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: [SOLVED] How can a unicode ligature character be found and replaced with a script?

20 Apr 2019, 06:58

Klarion wrote:
20 Apr 2019, 00:55
@ineuw
very nice

I am asking you "kindlly" to delete all of my codes from your computer and your head and do not use it ever
I do not want you to use my code

If you want to do something, write your own code

Regards
@Klarion, will do as you wish, but I have no clue what happened here. Before marking the topic solved, I posted a thank you note to everyone who helped but this note is missing. Can you please explain what got you so upset?
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .
zos474
Posts: 2
Joined: 23 Dec 2022, 00:03

Re: [SOLVED] How can a unicode ligature character be found and replaced with a script?

23 Dec 2022, 00:08

Not sure if this might help others

Code: Select all

SearchText := "\x{FB00}"
ReplaceText := "ff"
txtHold := RegExReplace(txtHold, SearchText, ReplaceText)
SearchText := "\x{FB01}"
ReplaceText := "fi"
txtHold := RegExReplace(txtHold, SearchText, ReplaceText)
SearchText := "\x{FB02}"
ReplaceText := "fl"
txtHold := RegExReplace(txtHold, SearchText, ReplaceText)
SearchText := "\x{FB03}"
ReplaceText := "ffi"
txtHold := RegExReplace(txtHold, SearchText, ReplaceText)
SearchText := "\x{FB04}"
ReplaceText := "ffl"
txtHold := RegExReplace(txtHold, SearchText, ReplaceText)
SearchText := "\x{FB05}"
ReplaceText := "ft"
txtHold := RegExReplace(txtHold, SearchText, ReplaceText)
SearchText := "\x{FB06}"
ReplaceText := "st"
txtHold := RegExReplace(txtHold, SearchText, ReplaceText)
zos474
Posts: 2
Joined: 23 Dec 2022, 00:03

Re: [SOLVED] How can a unicode ligature character be found and replaced with a script?

23 Dec 2022, 00:21

Or more directly :)

Code: Select all

;Replace ligatures
txtHold := RegExReplace(txtHold, "\x{FB00}", "ff")
txtHold := RegExReplace(txtHold, "\x{FB01}", "fi")
txtHold := RegExReplace(txtHold, "\x{FB02}", "fl")
txtHold := RegExReplace(txtHold, "\x{FB03}", "ffi")
txtHold := RegExReplace(txtHold, "\x{FB04}", "ffl")
txtHold := RegExReplace(txtHold, "\x{FB05}", "ft")
txtHold := RegExReplace(txtHold, "\x{FB06}", "st")
User avatar
ineuw
Posts: 172
Joined: 11 Sep 2014, 14:12

Re: [SOLVED] How can a unicode ligature character be found and replaced with a script?

23 Dec 2022, 19:24

Thanks for your solution, and happy holidays.
Win 10 Professional 64bit 21H2 16Gb Ram AHK current as of 2021-12-26 .

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: downstairs, OrangeCat and 179 guests