Remove letter accents in a string (function)

Post your working scripts, libraries and tools for AHK v1.1 and older
j-t-r
Posts: 17
Joined: 08 Jun 2015, 03:05

Remove letter accents in a string (function)

08 Jun 2015, 03:10

RemoveLetterAccents( text )

Supported letters/accents are (297 in total):
ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż

USAGE:

Code: Select all

text = mïchâël
text2 := RemoveLetterAccents( text )
msgbox %text2%
FUNCTION:

Code: Select all

RemoveLetterAccents( text )
{
replace=ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż
with=AaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaCcCcCcCcCcDdDdDEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeGgGgGgGgHhHhIiIiIiIiIiIiIiIiIiIiIiJjKkLlLlLlLlLlNnNnNnNnOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoPpPpRrRrRrSsSsSsSsTtTtTtUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuWwWwWwWwYyYyYyYyYyYyYyZzZzZz
Loop, Parse, Replace
   {
    stringmid, w, with, a_index, 1
    stringreplace, text, text, %a_loopfield%, %w%, All
   }
return text
}
IDEAS/SOURCES:
http://www.forum.freecommander.com/viewtopic.php?f=18&t=6264#p20204
http://www.autohotkey.com/board/topic/52366-convertingreplacing-special-characters/?p=327814
http://textmechanic.com/Remove-Letter-Accents.html
lexikos
Posts: 9583
Joined: 30 Sep 2013, 04:07
Contact:

Re: Remove letter accents in a string (function)

12 Jun 2015, 04:10

Thanks to Unicode standards, there is a general way to decompose letters with marks into their base characters and combining marks. Unfortunately, the letters with stroke do not decompose.

Code: Select all

; Removes marks from letters.  Requires Windows Vista or later.
StrUnmark(string) {
    len := DllCall("Normaliz.dll\NormalizeString", "int", 2
        , "wstr", string, "int", StrLen(string)
        , "ptr", 0, "int", 0)  ; Get *estimated* required buffer size.
    Loop {
        VarSetCapacity(buf, len * 2)
        len := DllCall("Normaliz.dll\NormalizeString", "int", 2
            , "wstr", string, "int", StrLen(string)
            , "ptr", &buf, "int", len)
        if len >= 0
            break
        if (A_LastError != 122) ; ERROR_INSUFFICIENT_BUFFER
            return
        len *= -1  ; This is the new estimate.
    }
    ; Remove combining marks and return result.
    return RegExReplace(StrGet(&buf, len, "UTF-16"), "\pM")
}
Example:

Code: Select all

MsgBox % StrUnmark("ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż")
User avatar
haichen
Posts: 631
Joined: 09 Feb 2014, 08:24

Re: Remove letter accents in a string (function)

12 Jun 2015, 08:47

Some time ago i made an unidecode port. But Not really nice code. The Original:Unidecode! This guy has translated most? unicode Chars to Ascii.
lexikos
Posts: 9583
Joined: 30 Sep 2013, 04:07
Contact:

Re: Remove letter accents in a string (function)

12 Jun 2015, 19:49

You can convert to ASCII just by calling WideCharToMultiByte (code page 20127 is US-ASCII 7-bit) but unfortunately, about half of the characters in j-t-r's list get replaced with '?' (i.e. they have no "best fit" ASCII conversion according to Windows, though they clearly should). AutoHotkey's StrPut() can't be used because it uses the WC_NO_BEST_FIT_CHARS flag.

Removing the marks from letters isn't the same as converting to ASCII. Both j-t-r's function and my own will preserve any non-letter Unicode characters.
aaffe
Posts: 192
Joined: 16 Jan 2014, 04:23

Re: Remove letter accents in a string (function)

15 Jun 2015, 08:00

You can also Substitute via an Array and RegExReplace:

Code: Select all

text:="ááããbb"
msgbox % RemoveLetterAccents(text)

RemoveLetterAccents( text )
{
 static Array := {"á": "a", "ã": "a"}
 for key, val in Array
  text:=RegExReplace(text,key,val)
 return text
}
Guest

Re: Remove letter accents in a string (function)

15 Jun 2015, 08:13

@aaffe - that is indeed a nice solution, I would swap the k,v around and as RE is case sensitive by default I would use two REs to capture all.

Code: Select all

text:="áÁãÃðÐ"
msgbox % RemoveLetterAccents(text)

RemoveLetterAccents(text)
	{
	 static Array := { "a" : "áàâǎăãảạäåāąấầẫẩậắằẵẳặǻ"
	 , "c" : "ćĉčċç"
	 , "d" : "ďđð"
	 , "e" : "éèêěĕẽẻėëēęếềễểẹệ"
	 , "g" : "ğĝġģ"
	 , "h" : "ĥħ"
	 , "i" : "íìĭîǐïĩįīỉịĵ"
	 , "k" : "ķ"
	 , "l" : "ĺľļłŀ"
	 , "n" : "ńňñņ"
	 , "o" : "óòŏôốồỗổǒöőõøǿōỏơớờỡởợọộ"
	 , "s" : "ṕṗŕřŗśŝšş"
	 , "t" : "ťţŧ"
	 , "u" : "úùŭûǔůüǘǜǚǖűũųūủưứừữửựụ"
	 , "w" : "ẃẁŵẅýỳŷÿỹỷỵ"
	 , "z" : "źžż" }
	 
	 for k, v in Array
		{
		 StringUpper, VU, v
		 StringUpper, KU, k
		 text:=RegExReplace(text,"[" v "]",k)
		 text:=RegExReplace(text,"[" VU "]",KU)
		}
	 Return text
	}
aaffe
Posts: 192
Joined: 16 Jan 2014, 04:23

Re: Remove letter accents in a string (function)

18 Jun 2015, 08:50

Wow, thats great, Guest!
Guest

Re: Remove letter accents in a string (function)

19 Jun 2015, 02:46

Purely for future reference and people reading this thread:

1 - if you have html entities in your text you can use unhtm() by SKAN to translate them to text first, you can find a working copy of that function in Nextrons script http://ahkscript.org/boards/viewtopic.p ... ilit=unhtm
(original code on AutoHotkey . com is no longer valid code due to forum "upgrade")
after unhtm() you can run RemoveLetterAccents()

2 - A more elaborate script is "Unidecode" by haichen http://ahkscript.org/boards/viewtopic.php?f=6&t=8257
User avatar
ScottElliff
Posts: 1
Joined: 18 Jul 2019, 04:53

Re: Remove letter accents in a string (function)

18 Jul 2019, 05:09

Hi there,

Someone mentioned the RemoveLetterAccents-tool on Stackoverflow in one of the discussions (can't find it now even applying the AutoHotKey tag). Anyone used it already?
bigbadplayer
Posts: 10
Joined: 12 Feb 2016, 03:00

Re: Remove letter accents in a string (function)

25 Feb 2020, 10:14

Very graet code! Nice and simple!

I've used in this very basic way: copy selected text into clipboard, replace accent chars, insert text (into the place of the orinigal text).

Code: Select all

+^ö::	;Change accent chars
{
	ClipSaved1 := ClipboardAll   ; Save the entire clipboard to a variable of your choice.
	; ... here make temporary use of the clipboard, such as for pasting Unicode text via Transform Unicode ...
	Clipboard := ; Clear the clipboard
	Send, ^c		; Copy selection
	Clipwait 2, 1
	if (Clipboard = "") {
		;Do nothing
	} else {
		Clipboard := RemoveLetterAccents(Clipboard)
		SendInput %Clipboard%
	}
	Clipboard := ClipSaved1   ; Restore the original clipboard. Note the use of Clipboard (not ClipboardAll).
	Clipwait 2, 1
Return
}

RemoveLetterAccents(ByRef text)
	{
	 static Array := { "a" : "áàâǎăãảạäåāąấầẫẩậắằẵẳặǻ"
	 , "c" : "ćĉčċç"
	 , "d" : "ďđð"
	 , "e" : "éèêěĕẽẻėëēęếềễểẹệ"
	 , "g" : "ğĝġģ"
	 , "h" : "ĥħ"
	 , "i" : "íìĭîǐïĩįīỉịĵ"
	 , "k" : "ķ"
	 , "l" : "ĺľļłŀ"
	 , "n" : "ńňñņ"
	 , "o" : "óòŏôốồỗổǒöőõøǿōỏơớờỡởợọộ"
	 , "s" : "ṕṗŕřŗśŝšş"
	 , "t" : "ťţŧ"
	 , "u" : "úùŭûǔůüǘǜǚǖűũųūủưứừữửựụ"
	 , "w" : "ẃẁŵẅýỳŷÿỹỷỵ"
	 , "z" : "źžż" }
	 
	 for k, v in Array
		{
		 StringUpper, VU, v
		 StringUpper, KU, k
		 text:=RegExReplace(text,"[" v "]",k)
		 text:=RegExReplace(text,"[" VU "]",KU)
		}
	 Return text
	}

Return to “Scripts and Functions (v1)”

Who is online

Users browsing this forum: No registered users and 150 guests