Page 1 of 1

Remove letter accents in a string (function)

Posted: 08 Jun 2015, 03:10
by j-t-r
RemoveLetterAccents( text )

Supported letters/accents are (297 in total):
ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż

USAGE:

Code: Select all

text = mïchâël
text2 := RemoveLetterAccents( text )
msgbox %text2%
FUNCTION:

Code: Select all

RemoveLetterAccents( text )
{
replace=ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż
with=AaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaCcCcCcCcCcDdDdDEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeEeGgGgGgGgHhHhIiIiIiIiIiIiIiIiIiIiIiJjKkLlLlLlLlLlNnNnNnNnOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoPpPpRrRrRrSsSsSsSsTtTtTtUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuUuWwWwWwWwYyYyYyYyYyYyYyZzZzZz
Loop, Parse, Replace
   {
    stringmid, w, with, a_index, 1
    stringreplace, text, text, %a_loopfield%, %w%, All
   }
return text
}
IDEAS/SOURCES:
http://www.forum.freecommander.com/viewtopic.php?f=18&t=6264#p20204
http://www.autohotkey.com/board/topic/52366-convertingreplacing-special-characters/?p=327814
http://textmechanic.com/Remove-Letter-Accents.html

Re: Remove letter accents in a string (function)

Posted: 08 Jun 2015, 13:15
by j-t-r

Re: Remove letter accents in a string (function)

Posted: 12 Jun 2015, 04:10
by lexikos
Thanks to Unicode standards, there is a general way to decompose letters with marks into their base characters and combining marks. Unfortunately, the letters with stroke do not decompose.

Code: Select all

; Removes marks from letters.  Requires Windows Vista or later.
StrUnmark(string) {
    len := DllCall("Normaliz.dll\NormalizeString", "int", 2
        , "wstr", string, "int", StrLen(string)
        , "ptr", 0, "int", 0)  ; Get *estimated* required buffer size.
    Loop {
        VarSetCapacity(buf, len * 2)
        len := DllCall("Normaliz.dll\NormalizeString", "int", 2
            , "wstr", string, "int", StrLen(string)
            , "ptr", &buf, "int", len)
        if len >= 0
            break
        if (A_LastError != 122) ; ERROR_INSUFFICIENT_BUFFER
            return
        len *= -1  ; This is the new estimate.
    }
    ; Remove combining marks and return result.
    return RegExReplace(StrGet(&buf, len, "UTF-16"), "\pM")
}
Example:

Code: Select all

MsgBox % StrUnmark("ÁáÀàÂâǍǎĂăÃãẢảẠạÄäÅåĀāĄąẤấẦầẪẫẨẩẬậẮắẰằẴẵẲẳẶặǺǻĆćĈĉČčĊċÇçĎďĐđÐÉéÈèÊêĚěĔĕẼẽẺẻĖėËëĒēĘęẾếỀềỄễỂểẸẹỆệĞğĜĝĠġĢģĤĥĦħÍíÌìĬĭÎîǏǐÏïĨĩĮįĪīỈỉỊịĴĵĶķĹ弾ĻļŁłĿŀŃńŇňÑñŅņÓóÒòŎŏÔôỐốỒồỖỗỔổǑǒÖöŐőÕõØøǾǿŌōỎỏƠơỚớỜờỠỡỞởỢợỌọỘộṔṕṖṗŔŕŘřŖŗŚśŜŝŠšŞşŤťŢţŦŧÚúÙùŬŭÛûǓǔŮůÜüǗǘǛǜǙǚǕǖŰűŨũŲųŪūỦủƯưỨứỪừỮữỬửỰựỤụẂẃẀẁŴŵẄẅÝýỲỳŶŷŸÿỸỹỶỷỴỵŹźŽžŻż")

Re: Remove letter accents in a string (function)

Posted: 12 Jun 2015, 08:47
by haichen
Some time ago i made an unidecode port. But Not really nice code. The Original:Unidecode! This guy has translated most? unicode Chars to Ascii.

Re: Remove letter accents in a string (function)

Posted: 12 Jun 2015, 19:49
by lexikos
You can convert to ASCII just by calling WideCharToMultiByte (code page 20127 is US-ASCII 7-bit) but unfortunately, about half of the characters in j-t-r's list get replaced with '?' (i.e. they have no "best fit" ASCII conversion according to Windows, though they clearly should). AutoHotkey's StrPut() can't be used because it uses the WC_NO_BEST_FIT_CHARS flag.

Removing the marks from letters isn't the same as converting to ASCII. Both j-t-r's function and my own will preserve any non-letter Unicode characters.

Re: Remove letter accents in a string (function)

Posted: 15 Jun 2015, 08:00
by aaffe
You can also Substitute via an Array and RegExReplace:

Code: Select all

text:="ááããbb"
msgbox % RemoveLetterAccents(text)

RemoveLetterAccents( text )
{
 static Array := {"á": "a", "ã": "a"}
 for key, val in Array
  text:=RegExReplace(text,key,val)
 return text
}

Re: Remove letter accents in a string (function)

Posted: 15 Jun 2015, 08:13
by Guest
@aaffe - that is indeed a nice solution, I would swap the k,v around and as RE is case sensitive by default I would use two REs to capture all.

Code: Select all

text:="áÁãÃðÐ"
msgbox % RemoveLetterAccents(text)

RemoveLetterAccents(text)
	{
	 static Array := { "a" : "áàâǎăãảạäåāąấầẫẩậắằẵẳặǻ"
	 , "c" : "ćĉčċç"
	 , "d" : "ďđð"
	 , "e" : "éèêěĕẽẻėëēęếềễểẹệ"
	 , "g" : "ğĝġģ"
	 , "h" : "ĥħ"
	 , "i" : "íìĭîǐïĩįīỉịĵ"
	 , "k" : "ķ"
	 , "l" : "ĺľļłŀ"
	 , "n" : "ńňñņ"
	 , "o" : "óòŏôốồỗổǒöőõøǿōỏơớờỡởợọộ"
	 , "s" : "ṕṗŕřŗśŝšş"
	 , "t" : "ťţŧ"
	 , "u" : "úùŭûǔůüǘǜǚǖűũųūủưứừữửựụ"
	 , "w" : "ẃẁŵẅýỳŷÿỹỷỵ"
	 , "z" : "źžż" }
	 
	 for k, v in Array
		{
		 StringUpper, VU, v
		 StringUpper, KU, k
		 text:=RegExReplace(text,"[" v "]",k)
		 text:=RegExReplace(text,"[" VU "]",KU)
		}
	 Return text
	}

Re: Remove letter accents in a string (function)

Posted: 18 Jun 2015, 08:50
by aaffe
Wow, thats great, Guest!

Re: Remove letter accents in a string (function)

Posted: 19 Jun 2015, 02:46
by Guest
Purely for future reference and people reading this thread:

1 - if you have html entities in your text you can use unhtm() by SKAN to translate them to text first, you can find a working copy of that function in Nextrons script http://ahkscript.org/boards/viewtopic.p ... ilit=unhtm
(original code on AutoHotkey . com is no longer valid code due to forum "upgrade")
after unhtm() you can run RemoveLetterAccents()

2 - A more elaborate script is "Unidecode" by haichen http://ahkscript.org/boards/viewtopic.php?f=6&t=8257

Re: Remove letter accents in a string (function)

Posted: 01 Feb 2019, 22:22
by IMEime
good, people.

Re: Remove letter accents in a string (function)

Posted: 18 Jul 2019, 05:09
by ScottElliff
Hi there,

Someone mentioned the RemoveLetterAccents-tool on Stackoverflow in one of the discussions (can't find it now even applying the AutoHotKey tag). Anyone used it already?