Remove letter accents in a string (function)

Post your working scripts, libraries and tools
j-t-r
Posts: 16
Joined: 08 Jun 2015, 03:05

Remove letter accents in a string (function)

08 Jun 2015, 03:10

RemoveLetterAccents( text )

Supported letters/accents are (297 in total):
ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż

USAGE:

Code: Select all

text = mïchâël
text2 := RemoveLetterAccents( text )
msgbox %text2%
FUNCTION:

Code: Select all

RemoveLetterAccents( text )
{
replace=ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż
with=AaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaCcCcCcCcCcDdDdDEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeGgGgGgGgHhHhIiIiIiIiIiIiIiIiIiIiIiJjKkLlLlLlLlLlNnNnNnNnOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoPpPpRrRrRrSsSsSsSsTtTtTtUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuWwWwWwWwYyYyYyYyYyYyYyZzZzZz
Loop, Parse, Replace
   {
    stringmid, w, with, a_index, 1
    stringreplace, text, text, %a_loopfield%, %w%, All
   }
return text
}
IDEAS/SOURCES:
http://www.forum.freecommander.com/viewtopic.php?f=18&t=6264#p20204
http://www.autohotkey.com/board/topic/52366-convertingreplacing-special-characters/?p=327814
http://textmechanic.com/Remove-Letter-Accents.html
lexikos
Posts: 6668
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Remove letter accents in a string (function)

12 Jun 2015, 04:10

Thanks to Unicode standards, there is a general way to decompose letters with marks into their base characters and combining marks. Unfortunately, the letters with stroke do not decompose.

Code: Select all

; Removes marks from letters.  Requires Windows Vista or later.
StrUnmark(string) {
    len := DllCall("Normaliz.dll\NormalizeString", "int", 2
        , "wstr", string, "int", StrLen(string)
        , "ptr", 0, "int", 0)  ; Get *estimated* required buffer size.
    Loop {
        VarSetCapacity(buf, len * 2)
        len := DllCall("Normaliz.dll\NormalizeString", "int", 2
            , "wstr", string, "int", StrLen(string)
            , "ptr", &buf, "int", len)
        if len >= 0
            break
        if (A_LastError != 122) ; ERROR_INSUFFICIENT_BUFFER
            return
        len *= -1  ; This is the new estimate.
    }
    ; Remove combining marks and return result.
    return RegExReplace(StrGet(&buf, len, "UTF-16"), "\pM")
}
Example:

Code: Select all

MsgBox % StrUnmark("ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż")
haichen
Posts: 223
Joined: 09 Feb 2014, 08:24

Re: Remove letter accents in a string (function)

12 Jun 2015, 08:47

Some time ago i made an unidecode port. But Not really nice code. The Original:Unidecode! This guy has translated most? unicode Chars to Ascii.
lexikos
Posts: 6668
Joined: 30 Sep 2013, 04:07
GitHub: Lexikos

Re: Remove letter accents in a string (function)

12 Jun 2015, 19:49

You can convert to ASCII just by calling WideCharToMultiByte (code page 20127 is US-ASCII 7-bit) but unfortunately, about half of the characters in j-t-r's list get replaced with '?' (i.e. they have no "best fit" ASCII conversion according to Windows, though they clearly should). AutoHotkey's StrPut() can't be used because it uses the WC_NO_BEST_FIT_CHARS flag.

Removing the marks from letters isn't the same as converting to ASCII. Both j-t-r's function and my own will preserve any non-letter Unicode characters.
aaffe
Posts: 154
Joined: 16 Jan 2014, 04:23

Re: Remove letter accents in a string (function)

15 Jun 2015, 08:00

You can also Substitute via an Array and RegExReplace:

Code: Select all

text:="ááããbb"
msgbox % RemoveLetterAccents(text)

RemoveLetterAccents( text )
{
 static Array := {"á": "a", "ã": "a"}
 for key, val in Array
  text:=RegExReplace(text,key,val)
 return text
}
Guest

Re: Remove letter accents in a string (function)

15 Jun 2015, 08:13

@aaffe - that is indeed a nice solution, I would swap the k,v around and as RE is case sensitive by default I would use two REs to capture all.

Code: Select all

text:="áÁãÃðÐ"
msgbox % RemoveLetterAccents(text)

RemoveLetterAccents(text)
	{
	 static Array := { "a" : "áàâǎăãảạäåāąấầẫẩậắằẵẳặǻ"
	 , "c" : "ćĉčċç"
	 , "d" : "ďđð"
	 , "e" : "éèêěĕẽẻėëēęếềễểẹệ"
	 , "g" : "ğĝġģ"
	 , "h" : "ĥħ"
	 , "i" : "íìĭîǐïĩįīỉịĵ"
	 , "k" : "ķ"
	 , "l" : "ĺľļłŀ"
	 , "n" : "ńňñņ"
	 , "o" : "óòŏôốồỗổǒöőõøǿōỏơớờỡởợọộ"
	 , "s" : "ṕṗŕřŗśŝšş"
	 , "t" : "ťţŧ"
	 , "u" : "úùŭûǔůüǘǜǚǖűũųūủưứừữửựụ"
	 , "w" : "ẃẁŵẅýỳŷÿỹỷỵ"
	 , "z" : "źžż" }
	 
	 for k, v in Array
		{
		 StringUpper, VU, v
		 StringUpper, KU, k
		 text:=RegExReplace(text,"[" v "]",k)
		 text:=RegExReplace(text,"[" VU "]",KU)
		}
	 Return text
	}
aaffe
Posts: 154
Joined: 16 Jan 2014, 04:23

Re: Remove letter accents in a string (function)

18 Jun 2015, 08:50

Wow, thats great, Guest!
Guest

Re: Remove letter accents in a string (function)

19 Jun 2015, 02:46

Purely for future reference and people reading this thread:

1 - if you have html entities in your text you can use unhtm() by SKAN to translate them to text first, you can find a working copy of that function in Nextrons script http://ahkscript.org/boards/viewtopic.p ... ilit=unhtm
(original code on AutoHotkey . com is no longer valid code due to forum "upgrade")
after unhtm() you can run RemoveLetterAccents()

2 - A more elaborate script is "Unidecode" by haichen http://ahkscript.org/boards/viewtopic.php?f=6&t=8257
User avatar
ScottElliff
Posts: 1
Joined: 18 Jul 2019, 04:53

Re: Remove letter accents in a string (function)

18 Jul 2019, 05:09

Hi there,

Someone mentioned the RemoveLetterAccents-tool on Stackoverflow in one of the discussions (can't find it now even applying the AutoHotKey tag). Anyone used it already?

Return to “Scripts and Functions”

Who is online

Users browsing this forum: No registered users and 35 guests