【已解决】如何获取中文首字母(AHK Unicode)

Post a reply


In an effort to prevent automatic submissions, we require that you complete the following challenge.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :| :mrgreen: :geek: :ugeek: :arrow: :angel: :clap: :crazy: :eh: :lolno: :problem: :shh: :shifty: :sick: :silent: :think: :thumbup: :thumbdown: :salute: :wave: :wtf: :yawn: :facepalm: :bravo: :dance: :beard: :morebeard: :xmas: :HeHe: :trollface: :cookie: :rainbow: :monkeysee: :monkeysay: :happybday: :headwall: :offtopic: :superhappy: :terms: :beer:
View more smilies

BBCode is ON
[img] is OFF
[flash] is OFF
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: 【已解决】如何获取中文首字母(AHK Unicode)

Re: 【已解决】如何获取中文首字母(AHK Unicode)

Post by autu » 20 Dec 2016, 20:48

太感谢了!
没想到的是还这么快,再次谢谢!

Re: 【已解决】如何获取中文首字母(AHK Unicode)

Post by tmplinshi » 20 Dec 2016, 09:35

autu wrote:以字母开始的话,如:“Unicode 中运行” 这个它就不会转为首字母,麻烦修正下
已修复

Re: 【已解决】如何获取中文首字母(AHK Unicode)

Post by autu » 20 Dec 2016, 08:59

以字母开始的话,如:“Unicode 中运行” 这个它就不会转为首字母,麻烦修正下

Re: 如何获取中文首字母(AHK Unicode)

Post by amnesiac » 16 Aug 2014, 03:47

@aamii
请到 http://ahkscript.org/boards/viewtopic.php?f=27&t=4255 查看你所提问题的进一步讨论。

Re: 如何获取中文首字母(AHK Unicode)

Post by aamii » 03 Aug 2014, 20:47

反过来查的问题,我现在这么解决:
①、在获取首字母的时候,包含多音字,比如“行走”输出为拼音串:[XH][Z]
②、用正则去匹配上面的串,这样不管你输入的是XZ还是HZ都能有效,查找到”行走“的。

仍然有的问题是:在①中,我当前用的是”汉字拼音首字母“对应表,用查表的方法获取。有没有像tmplishi那样的”方法“获得”多音字“呢?

Re: 如何获取中文首字母(AHK Unicode)

Post by amnesiac » 23 May 2014, 19:04

aamii wrote:首先感谢tmplinshi提供的函数,很便利。
实际应用中,我们需要用到多音字,比如 ”行走” XZ,”银行” YH。

主要是反过来查询的时候,让YH能匹配银行,YX也能匹配,像totalcmd上支持的那样。
代码好处理吗?谢谢。
可以转置建立一个拼音首字母或其组成的串与字、词对应的中间量,我想对象可行。

Re: 如何获取中文首字母(AHK Unicode)

Post by aamii » 23 May 2014, 04:00

首先感谢tmplinshi提供的函数,很便利。
实际应用中,我们需要用到多音字,比如 ”行走” XZ,”银行” YH。

主要是反过来查询的时候,让YH能匹配银行,YX也能匹配,像totalcmd上支持的那样。
代码好处理吗?谢谢。

Re: [已解决] 如何获取中文首字母(AHK Unicode)

Post by tmplinshi » 26 Jan 2014, 08:02

以下代码是从 php 代码转换过来的,比上面的 getfirstchar 支持更多汉字。

Code: Select all

MsgBox, % zh2py("二级汉字 -> 廿") ; 输出“EJHZ -> N”
Return

; 从 php 转换而来的 (http://www.sjyhome.com/php/201311170606.html)
zh2py(str)
{
	; 根据汉字区位表,(http://www.mytju.com/classcode/tools/QuWeiMa_FullList.asp)
	; 我们可以看到从16-55区之间是按拼音字母排序的,所以我们只需要判断某个汉字的区位码就可以得知它的拼音首字母.

	; 区位表第一部份,按拼音字母排序的.
	; 16区-55区
	/*
		'A'=>0xB0A1, 'B'=>0xB0C5, 'C'=>0xB2C1, 'D'=>0xB4EE, 'E'=>0xB6EA, 'F'=>0xB7A2, 'G'=>0xB8C1,'H'=>0xB9FE,
		'J'=>0xBBF7, 'K'=>0xBFA6, 'L'=>0xC0AC, 'M'=>0xC2E8, 'N'=>0xC4C3, 'O'=>0xC5B6, 'P'=>0xC5BE,'Q'=>0xC6DA,
		'R'=>0xC8BB, 'S'=>0xC8F6, 'T'=>0xCBFA, 'W'=>0xCDDA, 'X'=>0xCEF4, 'Y'=>0xD1B9, 'Z'=>0xD4D1
	*/
	static FirstTable := [ 0xB0C5, 0xB2C1, 0xB4EE, 0xB6EA, 0xB7A2, 0xB8C1, 0xB9FE, 0xBBF7, 0xBFA6, 0xC0AC, 0xC2E8
	           , 0xC4C3, 0xC5B6, 0xC5BE, 0xC6DA, 0xC8BB, 0xC8F6, 0xCBFA, 0xCDDA, 0xCEF4, 0xD1B9, 0xD4D1, 0xD7FA ]
	static FirstLetter := StrSplit("ABCDEFGHJKLMNOPQRSTWXYZ")

	; 区位表第二部份,不规则的,下面的字母是每个区里面对应字的拼音首字母.从网上查询整理出来的,可能会有部份错误.
	; 56区-87区
	static SecondTable := [ StrSplit("CJWGNSPGCGNEGYPBTYYZDXYKYGTZJNMJQMBSGZSCYJSYYFPGKBZGYDYWJKGKLJSWKPJQHYJWRDZLSYMRYPYWWCCKZNKYYG")
	           , StrSplit("TTNGJEYKKZYTCJNMCYLQLYPYSFQRPZSLWBTGKJFYXJWZLTBNCXJJJJTXDTTSQZYCDXXHGCKBPHFFSSTYBGMXLPBYLLBHLX")
	           , StrSplit("SMZMYJHSOJNGHDZQYKLGJHSGQZHXQGKXZZWYSCSCJXYEYXADZPMDSSMZJZQJYZCJJFWQJBDZBXGZNZCPWHWXHQKMWFBPBY")
	           , StrSplit("DTJZZKXHYLYGXFPTYJYYZPSZLFCHMQSHGMXXSXJYQDCSBBQBEFSJYHWWGZKPYLQBGLDLCDTNMAYDDKSSNGYCSGXLYZAYPN")
	           , StrSplit("PTSDKDYLHGYMYLCXPYCJNDQJWXQXFYYFJLEJPZRXCCQWQQSBZKYMGPLBMJRQCFLNYMYQMSQYRBCJTHZTQFRXQHXMQJCJLY")
	           , StrSplit("QGJMSHZKBSWYEMYLTXFSYDXWLYCJQXSJNQBSCTYHBFTDCYZDJWYGHQFRXWCKQKXEBPTLPXJZSRMEBWHJLBJSLYYSMDXLCL")
	           , StrSplit("QKXLHXJRZJMFQHXHWYWSBHTRXXGLHQHFNMGYKLDYXZPYLGGSMTCFBAJJZYLJTYANJGBJPLQGSZYQYAXBKYSECJSZNSLYZH")
	           , StrSplit("ZXLZCGHPXZHZNYTDSBCJKDLZAYFFYDLEBBGQYZKXGLDNDNYSKJSHDLYXBCGHXYPKDJMMZNGMMCLGWZSZXZJFZNMLZZTHCS")
	           , StrSplit("YDBDLLSCDDNLKJYKJSYCJLKWHQASDKNHCSGAGHDAASHTCPLCPQYBSZMPJLPCJOQLCDHJJYSPRCHNWJNLHLYYQYYWZPTCZG")
	           , StrSplit("WWMZFFJQQQQYXACLBHKDJXDGMMYDJXZLLSYGXGKJRYWZWYCLZMSSJZLDBYDCFCXYHLXCHYZJQSQQAGMNYXPFRKSSBJLYXY")
	           , StrSplit("SYGLNSCMHCWWMNZJJLXXHCHSYZSTTXRYCYXBYHCSMXJSZNPWGPXXTAYBGAJCXLYXDCCWZOCWKCCSBNHCPDYZNFCYYTYCKX")
	           , StrSplit("KYBSQKKYTQQXFCMCHCYKELZQBSQYJQCCLMTHSYWHMKTLKJLYCXWHEQQHTQKZPQSQSCFYMMDMGBWHWLGSLLYSDLMLXPTHMJ")
	           , StrSplit("HWLJZYHZJXKTXJLHXRSWLWZJCBXMHZQXSDZPSGFCSGLSXYMJSHXPJXWMYQKSMYPLRTHBXFTPMHYXLCHLHLZYLXGSSSSTCL")
	           , StrSplit("SLDCLRPBHZHXYYFHBMGDMYCNQQWLQHJJCYWJZYEJJDHPBLQXTQKWHLCHQXAGTLXLJXMSLJHTZKZJECXJCJNMFBYCSFYWYB")
	           , StrSplit("JZGNYSDZSQYRSLJPCLPWXSDWEJBJCBCNAYTWGMPAPCLYQPCLZXSBNMSGGFNZJJBZSFZYNTXHPLQKZCZWALSBCZJXSYZGWK")
	           , StrSplit("YPSGXFZFCDKHJGXTLQFSGDSLQWZKXTMHSBGZMJZRGLYJBPMLMSXLZJQQHZYJCZYDJWFMJKLDDPMJEGXYHYLXHLQYQHKYCW")
	           , StrSplit("CJMYYXNATJHYCCXZPCQLBZWWYTWBQCMLPMYRJCCCXFPZNZZLJPLXXYZTZLGDLTCKLYRZZGQTTJHHHJLJAXFGFJZSLCFDQZ")
	           , StrSplit("LCLGJDJZSNZLLJPJQDCCLCJXMYZFTSXGCGSBRZXJQQCTZHGYQTJQQLZXJYLYLBCYAMCSTYLPDJBYREGKLZYZHLYSZQLZNW")
	           , StrSplit("CZCLLWJQJJJKDGJZOLBBZPPGLGHTGZXYGHZMYCNQSYCYHBHGXKAMTXYXNBSKYZZGJZLQJTFCJXDYGJQJJPMGWGJJJPKQSB")
	           , StrSplit("GBMMCJSSCLPQPDXCDYYKYPCJDDYYGYWRHJRTGZNYQLDKLJSZZGZQZJGDYKSHPZMTLCPWNJYFYZDJCNMWESCYGLBTZZGMSS")
	           , StrSplit("LLYXYSXXBSJSBBSGGHFJLYPMZJNLYYWDQSHZXTYYWHMCYHYWDBXBTLMSYYYFSXJCBDXXLHJHFSSXZQHFZMZCZTQCXZXRTT")
	           , StrSplit("DJHNRYZQQMTQDMMGNYDXMJGDXCDYZBFFALLZTDLTFXMXQZDNGWQDBDCZJDXBZGSQQDDJCMBKZFFXMKDMDSYYSZCMLJDSYN")
	           , StrSplit("SPRSKMKMPCKLGTBQTFZSWTFGGLYPLLJZHGJJGYPZLTCSMCNBTJBQFKDHBYZGKPBBYMTDSSXTBNPDKLEYCJNYCDYKZTDHQH")
	           , StrSplit("SYZSCTARLLTKZLGECLLKJLQJAQNBDKKGHPJTZQKSECSHALQFMMGJNLYJBBTMLYZXDXJPLDLPCQDHZYCBZSCZBZMSLJFLKR")
	           , StrSplit("ZJSNFRGJHXPDHYJYBZGDLQCSEZGXLBLGYXTWMABCHECMWYJYZLLJJYHLGNDJLSLYGKDZPZXJYYZLWCXSZFGWYYDLYHCLJS")
	           , StrSplit("CMBJHBLYZLYCBLYDPDQYSXQZBYTDKYXJYYCNRJMPDJGKLCLJBCTBJDDBBLBLCZQRPYXJCJLZCSHLTOLJNMDDDLNGKATHQH")
	           , StrSplit("JHYKHEZNMSHRPHQQJCHGMFPRXHJGDYCHGHLYRZQLCYQJNZSQTKQJYMSZSWLCFQQQXYFGGYPTQWLMCRNFKKFSYYLQBMQAMM")
	           , StrSplit("MYXCTPSHCPTXXZZSMPHPSHMCLMLDQFYQXSZYJDJJZZHQPDSZGLSTJBCKBXYQZJSGPSXQZQZRQTBDKYXZKHHGFLBCSMDLDG")
	           , StrSplit("DZDBLZYYCXNNCSYBZBFGLZZXSWMSCCMQNJQSBDQSJTXXMBLTXZCLZSHZCXRQJGJYLXZFJPHYMZQQYDFQJJLZZNZJCDGZYG")
	           , StrSplit("CTXMZYSCTLKPHTXHTLBJXJLXSCDQXCBBTJFQZFSLTJBTKQBXXJJLJCHCZDBZJDCZJDCPRNPQCJPFCZLCLZXZDMXMPHJSGZ")
	           , StrSplit("GSZZQLYLWTJPFSYASMCJBTZYYCWMYTZSJJLJCQLWZMALBXYFBPNLSFHTGJWEJJXXGLLJSTGSHJQLZFKCGNNNSZFDEQFHBS")
	           , StrSplit("AQTGYLBXMMYGSZLDYDQMJJRGBJTKGDHGKBLQKBDMBYLXWCXYTTYBKMRTJZXQJBHLMHMJJZMQASLDCYXYQDLQCAFYWYXQHZ") ]


	static nothing := VarSetCapacity(var, 2)
	
	; 如果不包含中文字符,则直接返回原字符
	if !RegExMatch(str, "[^\x{00}-\x{ff}]")
		Return str
	
	Loop, Parse, str
	{
		StrPut(A_LoopField, &var, "CP936")
		H := NumGet(var, 0, "UChar")
		L := NumGet(var, 1, "UChar")
		
		; 字符集非法
		if (H < 0xB0 || L < 0xA1 || H > 0xF7 || L = 0xFF)
		{
			newStr .= A_LoopField
			Continue
		}
		
		if (H < 0xD8)//(H >= 0xB0 && H <=0xD7) ; 查询文字在一级汉字区(16-55)
		{
			W := (H << 8) | L
			For key, value in FirstTable
			{
				if (W < value)
				{
					newStr .= FirstLetter[key]
					Break
				}
			}
		}
		else ; if (H >= 0xD8 && H <= 0xF7) ; 查询中文在二级汉字区(56-87)
			newStr .= SecondTable[ H - 0xD8 + 1 ][ L - 0xA1 + 1 ]
	}
	
	Return newStr
}

Re: 如何获取中文首字母(AHK Unicode)

Post by tmplinshi » 25 Jan 2014, 10:36

太感谢了!

Code: Select all

MsgBox, % GetFirstChar("请在 AHK Unicode 中运行") ; 输出“QZ AHK Unicode ZYX”
Return

; 功能: 中文转为拼音首字母,非中文保持不变
; 备注: 在 AutoHotkey Unicode 中运行
GetFirstChar(str)
{
	static nothing := VarSetCapacity(var, 2)
	static array  := [ [-20319,-20284,"A"], [-20283,-19776,"B"], [-19775,-19219,"C"], [-19218,-18711,"D"], [-18710,-18527,"E"], [-18526,-18240,"F"], [-18239,-17923,"G"], [-17922,-17418,"H"], [-17417,-16475,"J"], [-16474,-16213,"K"], [-16212,-15641,"L"], [-15640,-15166,"M"], [-15165,-14923,"N"], [-14922,-14915,"O"], [-14914,-14631,"P"], [-14630,-14150,"Q"], [-14149,-14091,"R"], [-14090,-13319,"S"], [-13318,-12839,"T"], [-12838,-12557,"W"], [-12556,-11848,"X"], [-11847,-11056,"Y"], [-11055,-10247,"Z"] ]
	
	; 如果不包含中文字符,则直接返回原字符
	if !RegExMatch(str, "[^\x{00}-\x{ff}]")
		Return str

	Loop, Parse, str
	{
		if ( Asc(A_LoopField) >= 0x2E80 and Asc(A_LoopField) <= 0x9FFF )
		{
			StrPut(A_LoopField, &var, "CP936")
			nGBKCode := (NumGet(var, 0, "UChar") << 8) + NumGet(var, 1, "UChar") - 65536

			For i, a in array
				if nGBKCode between % a.1 and % a.2
				{
					out .= a.3
					Break
				}
		}
		else
			out .= A_LoopField
	}

	Return out
}

Re: 如何获取中文首字母(AHK Unicode)

Post by amnesiac » 24 Jan 2014, 21:35

因为这个问题有代表性,所以我打破了不动脚本的想法。对于你 getfirstchar(str) 函数,我改变两处:

Code: Select all

StrPut(str, &var, "CP936")  ; 转换为 GBK 编码,你的系统不确定,所以这里改用CP936

Code: Select all

nGBKCode := (NumGet(var, 0, "UChar") << 8) + NumGet(var, 1, "UChar") - 65536  ; 先获取的是低位,这个与版本是无关的
顺便说一句,后面这个问题以前曾在内存中存储即低字节(LChar)与高字节(WChar)(还有低字与高字等)问题中谈过。

Re: 如何获取中文首字母(AHK Unicode)

Post by tmplinshi » 24 Jan 2014, 10:20

谢谢!那在 AHK Unicode 中,如何把字符转换成 GBK 编码呢?

下面的“转换为 GBK 编码”是从你的一个帖子中看到的,但是与 AHK ANSI 的运行结果不一样。

Code: Select all

MsgBox, % Getfirstchar("中")
Return

getfirstchar(str)
{
	static array := [ [-20319,-20284,"A"], [-20283,-19776,"B"], [-19775,-19219,"C"], [-19218,-18711,"D"], [-18710,-18527,"E"], [-18526,-18240,"F"], [-18239,-17923,"G"], [-17922,-17418,"H"], [-17417,-16475,"J"], [-16474,-16213,"K"], [-16212,-15641,"L"], [-15640,-15166,"M"], [-15165,-14923,"N"], [-14922,-14915,"O"], [-14914,-14631,"P"], [-14630,-14150,"Q"], [-14149,-14091,"R"], [-14090,-13319,"S"], [-13318,-12839,"T"], [-12838,-12557,"W"], [-12556,-11848,"X"], [-11847,-11056,"Y"], [-11055,-10247,"Z"] ]

	; nGBKCode := ( Asc( SubStr(str, 1, 1) ) << 8 ) + Asc( SubStr(str, 2, 1) ) - 65536
	VarSetCapacity(var, 2)
	StrPut(str, &var, "cp0")  ; 转换为 GBK 编码
	nGBKCode := NumGet(var, 0, "UInt") - 65536

	For i, a in array
		if nGBKCode between % a.1 and % a.2
			Return a.3
}

Re: 如何获取中文首字母(AHK Unicode)

Post by amnesiac » 24 Jan 2014, 06:34

中文在Unicode中编码是按部首分布,不过在GBK中按拼音分布,所以考虑利用这点。
先把中文转换为GBK编码,接着获取编码判断(如上面的例子)。好久没动手了,未测试。

如果不嫌烦,也可以自己建立映射表。

【已解决】如何获取中文首字母(AHK Unicode)

Post by tmplinshi » 23 Jan 2014, 11:35

请问如何在 AHK U32 版本中,获取中文对应的首字母?

以下的函数只能在 ANSI 中运行:

Code: Select all

MsgBox, % Getfirstchar("中")
Return

getfirstchar(str)
{
	static array := [ [-20319,-20284,"A"], [-20283,-19776,"B"], [-19775,-19219,"C"], [-19218,-18711,"D"], [-18710,-18527,"E"], [-18526,-18240,"F"], [-18239,-17923,"G"], [-17922,-17418,"H"], [-17417,-16475,"J"], [-16474,-16213,"K"], [-16212,-15641,"L"], [-15640,-15166,"M"], [-15165,-14923,"N"], [-14922,-14915,"O"], [-14914,-14631,"P"], [-14630,-14150,"Q"], [-14149,-14091,"R"], [-14090,-13319,"S"], [-13318,-12839,"T"], [-12838,-12557,"W"], [-12556,-11848,"X"], [-11847,-11056,"Y"], [-11055,-10247,"Z"] ]

	nGBKCode := ( Asc( SubStr(str, 1, 1) ) << 8 ) + Asc( SubStr(str, 2, 1) ) - 65536

	For i, a in array
		if nGBKCode between % a.1 and % a.2
			Return a.3
}
我看了好些帖子文章,但还是不太理解。相关文章:

Top