AutoHotkey Homepage AutoHotkey Community
Let's help each other out
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

gender-verification by forename (cmd-line-tool & db)

 
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Utilities & Resources
View previous topic :: View next topic  
Author Message
Zed Gecko



Joined: 23 Sep 2006
Posts: 120

PostPosted: Sun Aug 12, 2007 5:14 pm    Post subject: gender-verification by forename (cmd-line-tool & db) Reply with quote

Just recently published by a german magazine: a tool to decide if a name is female or male.

ftp://ftp.heise.de/pub/ct/listings/0717-182.zip
www.heise.de/ct soft-link 0717182

this zip contains a cmd-line tool, the c-source and the txt-file with the match-data for about 40000 Names.
Quote:

Overview of the program "gender" by Jörg MICHAEL


The program "gender.c" is a program for determining the gender of a given
fist name.

List of files:

a) gen_ext.h (contains macros and prototypes; may be changed)
b) umlaut.h (contains lists of umlauts)
c) gender.c (this is the "workhorse" of the program)
d) nam_dict.txt (dictionary file containing first names)

The file "nam_dict.txt", which contains a list of first names, uses the
char set "iso8859-1".

If you want to use "gender.c" as a library, delete the line
"#define GENDER_EXECUTABLE" from the file "gen_ext.h".


========================================================================


The dictionary file "nam_dict.txt"


The program "gender.c" uses the dictionary file "nam_dict.txt" as a data
source. This file contains a list of more than 40,000 first names and
gender, plus some 600 pairs of "equivalent" names.
This list should be able to cover the vast majority of first names
in all European contries and in some overseas countries (e.g. China,
India, Japan, U.S.A.) as well.

Also included in this file is information on the approximate frequency
of each name. The scale goes from 1 (=rare) to 13 (=extremely common).
The value 10 has been formatted to represent at least 2 percent of
the population. (The values 11 to 13 have been added last.)
The scale is logarithmic. For countries with very good statistics,
each step (down to frequency 2) represents a factor of 2.
For example, a frequency value of 7 means that the correspondig first
name has an absolute in the range of 0.25 % to 0.5 %.

The sorting order of the file "nam_dict.txt" is governed by the search
algorithm of the program "gender.c". Hence, names with "expandable"
umlauts can be found twice in this dictionary, first with sorting
according to "expanded" umlauts, and second with sorting according to
"compressed" umlauts (e.g. 'Ö' is sorted like "Oe" and 'O').

You don't have to reformat this file for use in a unix environment,
because the DOS linefeeds (trailing '\r') are ignored when the file
is read.


========================================================================


A few words on quality of data


The dictionary of first names has been prepared with utmost care.

For example, the Turkish, Indian and Korean names in this dictionary
have all been independently lassified by several native speakers.
I also took special care to list only those names which can currently
be found.

The lesson from this?
Any modifications should be done very cautiously (and they must also
adhere to the sorting required by the search algorithm).

For example, knowing that "Sascha" is a boy's name in Germany, the author
never assumed the English "Sasha" to be a girl's name.
Knowing that "Jan" is a boy's name in Germany, I never assumed it to be
also a English short form of "Janet". Another case in point is the name
"Esra". This is a boy's name in Germany, but a girl's name in Turkey.

Or consider the following first names:

Ildikó female Hungarian name
Mitja male Russian name
Elizaveta rare name; looks like misspelled "Elizabeta"
Roelf rare name; looks like German "Rolf" with an erroneous 'e'

Borchert, Oltmann, Sievert, Hartmann look like common German surnames



the tool is released under the LGPL.


I have created a little Gui for the main-function of gender.exe.
The script should be stored in the same directory as gender.exe.



Code:
#NoTrayIcon
SetWorkingDir %A_ScriptDir%
;------------auto-execute----------------------------------------------------
IfNotExist, gender.exe
{
    MsgBox, gender.exe not found!
    ExitApp
}
IfNotExist, nam_dict.txt
{
    MsgBox, nam_dict.txt not found!
    ExitApp
}

Gui, +Resize
Gui, Margin, 3, 3
Gui, Add, Tab, w284 h260 vMyTab, Get Gender|Check Nickname|List Names|Statistics

Gui, Tab, Get Gender
Gui, Font, S11
Gui, Add, Edit, x8 y30 R1 W270 vMyNameString
Gui, Font, S8
Gui, Add, Button, Default gCheckGender x8 y+5, &Check Gender
Gui, Add, Button, gCheckGenderTrace x+38, Check Gender (Display&Trace)
Gui, Font, S11
Gui, Add, Edit, R9 W270 x8 y+5 vMyResultField ReadOnly
Gui, Font, S8
Gui, Add, Checkbox, x8 y+5 vUseHotkey gActivateHotkey, Use Alt+G to check selected text for gender

Gui, Tab, Check Nickname
Gui, Font, S11
Gui, Add, Text, x8 y33, Name 1:
Gui, Add, Edit, x58 y30 R1 W220 vMyNickAString
Gui, Add, Text, x8 y63, Name 2:
Gui, Add, Edit, x58 y60 R1 W220 vMyNickBString
Gui, Font, S8
Gui, Add, Button, gCheckNick x59 y90, Check, if two first &Names are "equivalent"
Gui, Font, S11
Gui, Add, Edit, R8 W270 x8 y+5 vMyNickResultField ReadOnly

Gui, Tab, List Names
Gui, Add, Text, x8 y33, Country :
Gui, Add, Edit, x60 y30 R1 W218 vMyCountryString
Gui, Font, S8
Gui, Add, Button, gListNames x61 y60, &List all names of the given country.
Gui, Font, S11
Gui, Add, Edit, R10 W270 x8 y+5 vMyCountryResultField ReadOnly

Gui, Tab, Statistics
Gui, Font, S8
Gui, Add, Button, gShowStats x8 y33, &Show statistics
Gui, Font, S11
Gui, Add, Edit, R11 W270 x8 y+7 vMyStatResultField ReadOnly

Gui, Show, , Gender Verification
return


return
;--------------End-auto-execute----------------------------------------------

;--------------gender.exe related--------------------------------------------
CheckGender:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyNameString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


CheckGenderTrace:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyNameString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " "-trace" >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


CheckSelectedforGender:
ClipSaved := ClipboardAll
Send ^c
ClipWait, 4
if ErrorLevel
{
    GuiControl, , MyResultField, The attempt to copy text onto the clipboard failed.
    return
}
Loop, parse, Clipboard, `n, `r  ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
    MyNameString := A_LoopField
    break
}
StringLeft, MyNameString, MyNameString, 100
Gui +Disabled
Gui, Flash
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyNameString, %MyNameString%
   GuiControl, , MyResultField, %MyResult%
   GuiControl, , MyNickAString, %MyNameString%
   ToolTip, %MyResult%
   SetTimer, RemoveToolTip, 5000

}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
Clipboard := ClipSaved
ClipSaved =
return

RemoveToolTip:
SetTimer, RemoveToolTip, Off
ToolTip
return


CheckNick:
Gui +Disabled
Gui, Flash
Gui, Submit, Nohide
StringLeft, MyNameString, MyNickAString, 100
StringLeft, MyNameString, MyNickBString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-check_nickname" "%MyNickAString% " "%MyNickBString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyNickResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyNickResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


ListNames:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyCountryString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-print_names_of_country" "%MyCountryString%" "RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyCountryResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyCountryResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


ShowStats:
Gui +Disabled
Gui, Flash
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-statistics" >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyStatResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyStatResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return
;---------------End-gender.exe related----------------------------------------

;---------------Hotkey related------------------------------------------------
ActivateHotkey:
Gui, Submit, Nohide
if (UseHotkey = 1)
{
   Hotkey, !g, CheckSelectedforGender, ON

}
if (UseHotkey = 0)
{
   Hotkey, !g, CheckSelectedforGender, OFF
}
return
;---------------End-Hotkey related--------------------------------------------

;---------------Gui related---------------------------------------------------
GuiSize:
if (A_EventInfo != 1)
{
   if (A_GuiWidth < 290)
      Gui, Show, w290
   if (A_GuiHeight < 260)
      Gui, Show, h260
}
GuiControl, Move, MyTab, % "w" A_GuiWidth-6 "h" A_GuiHeight-6
GuiControl, Move, MyResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-115
GuiControl, Move, UseHotkey, % "y" A_GuiHeight-25
GuiControl, Move, MyNickResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-130
GuiControl, Move, MyCountryResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-100
GuiControl, Move, MyStatResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-75
return

GuiClose:
ExitApp

;---------------End-Gui related-----------------------------------------------


The exe- and the ahk-file can be downloaded here: http://www.autohotkey.net/~Zed_Gecko/gender/WinGender2.zip


Last edited by Zed Gecko on Fri Jan 09, 2009 5:38 pm; edited 2 times in total
Back to top
View user's profile Send private message
Zed Gecko



Joined: 23 Sep 2006
Posts: 120

PostPosted: Fri Jan 09, 2009 5:16 pm    Post subject: Reply with quote

The Author of gender.exe has published a new version.
You can download it
from http://www.heise.de/ct/ftp/07/17/182/
or from http://www.autohotkey.net/~Zed_Gecko/gender/0717-182.zip

WinGender was updated to work with the new version. (see link and code above)
_________________
1) All my code can be reused in ANY way. 2) Please check the help and the forum-search, before posting questions; the answer is out there...
Back to top
View user's profile Send private message
hugov



Joined: 27 May 2007
Posts: 2181

PostPosted: Sat Jan 10, 2009 12:25 pm    Post subject: Reply with quote

Very useful, thanks.
_________________
Tut 4 Newbies
TF : Text file & string lib, TF Forum
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    AutoHotkey Community Forum Index -> Utilities & Resources All times are GMT
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum


Powered by phpBB © 2001, 2005 phpBB Group