Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

gender-verification by forename (cmd-line-tool & db)


  • Please log in to reply
2 replies to this topic
Zed Gecko
  • Members
  • 149 posts
  • Last active:
  • Joined: 23 Sep 2006
Just recently published by a german magazine: a tool to decide if a name is female or male.

<!-- m -->ftp://ftp.heise.de/pub/ct/listings/0717-182.zip<!-- m -->
<!-- w -->www.heise.de/ct<!-- w --> soft-link 0717182

this zip contains a cmd-line tool, the c-source and the txt-file with the match-data for about 40000 Names.

Overview of the program "gender" by Jörg MICHAEL


The program "gender.c" is a program for determining the gender of a given
fist name.

List of files:

a) gen_ext.h (contains macros and prototypes; may be changed)
b) umlaut.h (contains lists of umlauts)
c) gender.c (this is the "workhorse" of the program)
d) nam_dict.txt (dictionary file containing first names)

The file "nam_dict.txt", which contains a list of first names, uses the
char set "iso8859-1".

If you want to use "gender.c" as a library, delete the line
"#define GENDER_EXECUTABLE" from the file "gen_ext.h".


========================================================================


The dictionary file "nam_dict.txt"


The program "gender.c" uses the dictionary file "nam_dict.txt" as a data
source. This file contains a list of more than 40,000 first names and
gender, plus some 600 pairs of "equivalent" names.
This list should be able to cover the vast majority of first names
in all European contries and in some overseas countries (e.g. China,
India, Japan, U.S.A.) as well.

Also included in this file is information on the approximate frequency
of each name. The scale goes from 1 (=rare) to 13 (=extremely common).
The value 10 has been formatted to represent at least 2 percent of
the population. (The values 11 to 13 have been added last.)
The scale is logarithmic. For countries with very good statistics,
each step (down to frequency 2) represents a factor of 2.
For example, a frequency value of 7 means that the correspondig first
name has an absolute in the range of 0.25 % to 0.5 %.

The sorting order of the file "nam_dict.txt" is governed by the search
algorithm of the program "gender.c". Hence, names with "expandable"
umlauts can be found twice in this dictionary, first with sorting
according to "expanded" umlauts, and second with sorting according to
"compressed" umlauts (e.g. 'Ö' is sorted like "Oe" and 'O').

You don't have to reformat this file for use in a unix environment,
because the DOS linefeeds (trailing '\r') are ignored when the file
is read.


========================================================================


A few words on quality of data


The dictionary of first names has been prepared with utmost care.

For example, the Turkish, Indian and Korean names in this dictionary
have all been independently lassified by several native speakers.
I also took special care to list only those names which can currently
be found.

The lesson from this?
Any modifications should be done very cautiously (and they must also
adhere to the sorting required by the search algorithm).

For example, knowing that "Sascha" is a boy's name in Germany, the author
never assumed the English "Sasha" to be a girl's name.
Knowing that "Jan" is a boy's name in Germany, I never assumed it to be
also a English short form of "Janet". Another case in point is the name
"Esra". This is a boy's name in Germany, but a girl's name in Turkey.

Or consider the following first names:

Ildikó female Hungarian name
Mitja male Russian name
Elizaveta rare name; looks like misspelled "Elizabeta"
Roelf rare name; looks like German "Rolf" with an erroneous 'e'

Borchert, Oltmann, Sievert, Hartmann look like common German surnames


the tool is released under the LGPL.


I have created a little Gui for the main-function of gender.exe.
The script should be stored in the same directory as gender.exe.
Posted Image


#NoTrayIcon
SetWorkingDir %A_ScriptDir%
;------------auto-execute----------------------------------------------------
IfNotExist, gender.exe
{
    MsgBox, gender.exe not found!
    ExitApp
}
IfNotExist, nam_dict.txt
{
    MsgBox, nam_dict.txt not found!
    ExitApp
}

Gui, +Resize
Gui, Margin, 3, 3
Gui, Add, Tab, w284 h260 vMyTab, Get Gender|Check Nickname|List Names|Statistics

Gui, Tab, Get Gender
Gui, Font, S11
Gui, Add, Edit, x8 y30 R1 W270 vMyNameString
Gui, Font, S8
Gui, Add, Button, Default gCheckGender x8 y+5, &Check Gender
Gui, Add, Button, gCheckGenderTrace x+38, Check Gender (Display&Trace)
Gui, Font, S11
Gui, Add, Edit, R9 W270 x8 y+5 vMyResultField ReadOnly
Gui, Font, S8
Gui, Add, Checkbox, x8 y+5 vUseHotkey gActivateHotkey, Use Alt+G to check selected text for gender

Gui, Tab, Check Nickname
Gui, Font, S11
Gui, Add, Text, x8 y33, Name 1:
Gui, Add, Edit, x58 y30 R1 W220 vMyNickAString
Gui, Add, Text, x8 y63, Name 2:
Gui, Add, Edit, x58 y60 R1 W220 vMyNickBString
Gui, Font, S8
Gui, Add, Button, gCheckNick x59 y90, Check, if two first &Names are "equivalent"
Gui, Font, S11
Gui, Add, Edit, R8 W270 x8 y+5 vMyNickResultField ReadOnly

Gui, Tab, List Names
Gui, Add, Text, x8 y33, Country :
Gui, Add, Edit, x60 y30 R1 W218 vMyCountryString
Gui, Font, S8
Gui, Add, Button, gListNames x61 y60, &List all names of the given country.
Gui, Font, S11
Gui, Add, Edit, R10 W270 x8 y+5 vMyCountryResultField ReadOnly

Gui, Tab, Statistics
Gui, Font, S8
Gui, Add, Button, gShowStats x8 y33, &Show statistics
Gui, Font, S11
Gui, Add, Edit, R11 W270 x8 y+7 vMyStatResultField ReadOnly

Gui, Show, , Gender Verification
return


return
;--------------End-auto-execute----------------------------------------------

;--------------gender.exe related--------------------------------------------
CheckGender:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyNameString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
	GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
	FileRead, MyResult, Result.txt
	GuiControl, , MyResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


CheckGenderTrace:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyNameString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " "-trace" >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
	GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
	FileRead, MyResult, Result.txt
	GuiControl, , MyResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


CheckSelectedforGender:
ClipSaved := ClipboardAll
Send ^c
ClipWait, 4
if ErrorLevel
{
    GuiControl, , MyResultField, The attempt to copy text onto the clipboard failed.
    return
}
Loop, parse, Clipboard, `n, `r  ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
    MyNameString := A_LoopField
    break
}
StringLeft, MyNameString, MyNameString, 100
Gui +Disabled
Gui, Flash
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
	GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
	FileRead, MyResult, Result.txt
	GuiControl, , MyNameString, %MyNameString%
	GuiControl, , MyResultField, %MyResult%
	GuiControl, , MyNickAString, %MyNameString%
	ToolTip, %MyResult%
	SetTimer, RemoveToolTip, 5000

}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
Clipboard := ClipSaved
ClipSaved =
return

RemoveToolTip:
SetTimer, RemoveToolTip, Off
ToolTip
return


CheckNick:
Gui +Disabled
Gui, Flash
Gui, Submit, Nohide
StringLeft, MyNameString, MyNickAString, 100
StringLeft, MyNameString, MyNickBString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-check_nickname" "%MyNickAString% " "%MyNickBString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
	GuiControl, , MyNickResultField, Calling gender.exe produced an error!
else
{
	FileRead, MyResult, Result.txt
	GuiControl, , MyNickResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


ListNames:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyCountryString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-print_names_of_country" "%MyCountryString%" "RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
	GuiControl, , MyCountryResultField, Calling gender.exe produced an error!
else
{
	FileRead, MyResult, Result.txt
	GuiControl, , MyCountryResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


ShowStats:
Gui +Disabled
Gui, Flash
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-statistics" >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
	GuiControl, , MyStatResultField, Calling gender.exe produced an error!
else
{
	FileRead, MyResult, Result.txt
	GuiControl, , MyStatResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return
;---------------End-gender.exe related----------------------------------------

;---------------Hotkey related------------------------------------------------
ActivateHotkey:
Gui, Submit, Nohide
if (UseHotkey = 1)
{
	Hotkey, !g, CheckSelectedforGender, ON

}
if (UseHotkey = 0)
{
	Hotkey, !g, CheckSelectedforGender, OFF
}
return
;---------------End-Hotkey related--------------------------------------------

;---------------Gui related---------------------------------------------------
GuiSize:
if (A_EventInfo != 1)
{
	if (A_GuiWidth < 290)
		Gui, Show, w290
	if (A_GuiHeight < 260)
		Gui, Show, h260
}
GuiControl, Move, MyTab, % "w" A_GuiWidth-6 "h" A_GuiHeight-6 
GuiControl, Move, MyResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-115 
GuiControl, Move, UseHotkey, % "y" A_GuiHeight-25
GuiControl, Move, MyNickResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-130 
GuiControl, Move, MyCountryResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-100 
GuiControl, Move, MyStatResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-75 
return

GuiClose:
ExitApp

;---------------End-Gui related-----------------------------------------------

The exe- and the ahk-file can be downloaded here: <!-- m -->https://ahknet.autoh... ... ender2.zip<!-- m -->

Zed Gecko
  • Members
  • 149 posts
  • Last active:
  • Joined: 23 Sep 2006
The Author of gender.exe has published a new version.
You can download it
from http://www.heise.de/ct/ftp/07/17/182/
or from https://ahknet.autoh...er/0717-182.zip

WinGender was updated to work with the new version. (see link and code above)
code removed due to protest.
http://www.autohotke...pic.php?t=81795

SoLong&Thx4AllTheFish
  • Members
  • 4999 posts
  • Last active:
  • Joined: 27 May 2007
Very useful, thanks.