AutoHotkey Community

It is currently May 26th, 2012, 1:57 pm

All times are UTC [ DST ]




Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: August 12th, 2007, 6:14 pm 
Offline

Joined: September 23rd, 2006, 1:58 pm
Posts: 149
Just recently published by a german magazine: a tool to decide if a name is female or male.

ftp://ftp.heise.de/pub/ct/listings/0717-182.zip
www.heise.de/ct soft-link 0717182

this zip contains a cmd-line tool, the c-source and the txt-file with the match-data for about 40000 Names.
Quote:
Overview of the program "gender" by Jörg MICHAEL


The program "gender.c" is a program for determining the gender of a given
fist name.

List of files:

a) gen_ext.h (contains macros and prototypes; may be changed)
b) umlaut.h (contains lists of umlauts)
c) gender.c (this is the "workhorse" of the program)
d) nam_dict.txt (dictionary file containing first names)

The file "nam_dict.txt", which contains a list of first names, uses the
char set "iso8859-1".

If you want to use "gender.c" as a library, delete the line
"#define GENDER_EXECUTABLE" from the file "gen_ext.h".


========================================================================


The dictionary file "nam_dict.txt"


The program "gender.c" uses the dictionary file "nam_dict.txt" as a data
source. This file contains a list of more than 40,000 first names and
gender, plus some 600 pairs of "equivalent" names.
This list should be able to cover the vast majority of first names
in all European contries and in some overseas countries (e.g. China,
India, Japan, U.S.A.) as well.

Also included in this file is information on the approximate frequency
of each name. The scale goes from 1 (=rare) to 13 (=extremely common).
The value 10 has been formatted to represent at least 2 percent of
the population. (The values 11 to 13 have been added last.)
The scale is logarithmic. For countries with very good statistics,
each step (down to frequency 2) represents a factor of 2.
For example, a frequency value of 7 means that the correspondig first
name has an absolute in the range of 0.25 % to 0.5 %.

The sorting order of the file "nam_dict.txt" is governed by the search
algorithm of the program "gender.c". Hence, names with "expandable"
umlauts can be found twice in this dictionary, first with sorting
according to "expanded" umlauts, and second with sorting according to
"compressed" umlauts (e.g. 'Ö' is sorted like "Oe" and 'O').

You don't have to reformat this file for use in a unix environment,
because the DOS linefeeds (trailing '\r') are ignored when the file
is read.


========================================================================


A few words on quality of data


The dictionary of first names has been prepared with utmost care.

For example, the Turkish, Indian and Korean names in this dictionary
have all been independently lassified by several native speakers.
I also took special care to list only those names which can currently
be found.

The lesson from this?
Any modifications should be done very cautiously (and they must also
adhere to the sorting required by the search algorithm).

For example, knowing that "Sascha" is a boy's name in Germany, the author
never assumed the English "Sasha" to be a girl's name.
Knowing that "Jan" is a boy's name in Germany, I never assumed it to be
also a English short form of "Janet". Another case in point is the name
"Esra". This is a boy's name in Germany, but a girl's name in Turkey.

Or consider the following first names:

Ildikó female Hungarian name
Mitja male Russian name
Elizaveta rare name; looks like misspelled "Elizabeta"
Roelf rare name; looks like German "Rolf" with an erroneous 'e'

Borchert, Oltmann, Sievert, Hartmann look like common German surnames



the tool is released under the LGPL.


I have created a little Gui for the main-function of gender.exe.
The script should be stored in the same directory as gender.exe.
Image


Code:
#NoTrayIcon
SetWorkingDir %A_ScriptDir%
;------------auto-execute----------------------------------------------------
IfNotExist, gender.exe
{
    MsgBox, gender.exe not found!
    ExitApp
}
IfNotExist, nam_dict.txt
{
    MsgBox, nam_dict.txt not found!
    ExitApp
}

Gui, +Resize
Gui, Margin, 3, 3
Gui, Add, Tab, w284 h260 vMyTab, Get Gender|Check Nickname|List Names|Statistics

Gui, Tab, Get Gender
Gui, Font, S11
Gui, Add, Edit, x8 y30 R1 W270 vMyNameString
Gui, Font, S8
Gui, Add, Button, Default gCheckGender x8 y+5, &Check Gender
Gui, Add, Button, gCheckGenderTrace x+38, Check Gender (Display&Trace)
Gui, Font, S11
Gui, Add, Edit, R9 W270 x8 y+5 vMyResultField ReadOnly
Gui, Font, S8
Gui, Add, Checkbox, x8 y+5 vUseHotkey gActivateHotkey, Use Alt+G to check selected text for gender

Gui, Tab, Check Nickname
Gui, Font, S11
Gui, Add, Text, x8 y33, Name 1:
Gui, Add, Edit, x58 y30 R1 W220 vMyNickAString
Gui, Add, Text, x8 y63, Name 2:
Gui, Add, Edit, x58 y60 R1 W220 vMyNickBString
Gui, Font, S8
Gui, Add, Button, gCheckNick x59 y90, Check, if two first &Names are "equivalent"
Gui, Font, S11
Gui, Add, Edit, R8 W270 x8 y+5 vMyNickResultField ReadOnly

Gui, Tab, List Names
Gui, Add, Text, x8 y33, Country :
Gui, Add, Edit, x60 y30 R1 W218 vMyCountryString
Gui, Font, S8
Gui, Add, Button, gListNames x61 y60, &List all names of the given country.
Gui, Font, S11
Gui, Add, Edit, R10 W270 x8 y+5 vMyCountryResultField ReadOnly

Gui, Tab, Statistics
Gui, Font, S8
Gui, Add, Button, gShowStats x8 y33, &Show statistics
Gui, Font, S11
Gui, Add, Edit, R11 W270 x8 y+7 vMyStatResultField ReadOnly

Gui, Show, , Gender Verification
return


return
;--------------End-auto-execute----------------------------------------------

;--------------gender.exe related--------------------------------------------
CheckGender:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyNameString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


CheckGenderTrace:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyNameString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " "-trace" >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


CheckSelectedforGender:
ClipSaved := ClipboardAll
Send ^c
ClipWait, 4
if ErrorLevel
{
    GuiControl, , MyResultField, The attempt to copy text onto the clipboard failed.
    return
}
Loop, parse, Clipboard, `n, `r  ; Specifying `n prior to `r allows both Windows and Unix files to be parsed.
{
    MyNameString := A_LoopField
    break
}
StringLeft, MyNameString, MyNameString, 100
Gui +Disabled
Gui, Flash
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-get_gender" "%MyNameString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyNameString, %MyNameString%
   GuiControl, , MyResultField, %MyResult%
   GuiControl, , MyNickAString, %MyNameString%
   ToolTip, %MyResult%
   SetTimer, RemoveToolTip, 5000

}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
Clipboard := ClipSaved
ClipSaved =
return

RemoveToolTip:
SetTimer, RemoveToolTip, Off
ToolTip
return


CheckNick:
Gui +Disabled
Gui, Flash
Gui, Submit, Nohide
StringLeft, MyNameString, MyNickAString, 100
StringLeft, MyNameString, MyNickBString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-check_nickname" "%MyNickAString% " "%MyNickBString% " >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyNickResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyNickResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


ListNames:
Gui, Submit, Nohide
Gui +Disabled
Gui, Flash
StringLeft, MyNameString, MyCountryString, 100
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-print_names_of_country" "%MyCountryString%" "RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyCountryResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyCountryResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return


ShowStats:
Gui +Disabled
Gui, Flash
RunWait, %comspec% /c ""%A_WorkingDir%\gender.exe" "-statistics" >"RESULT.TXT"", , Hide UseErrorlevel
if ErrorLevel = ERROR
   GuiControl, , MyStatResultField, Calling gender.exe produced an error!
else
{
   FileRead, MyResult, Result.txt
   GuiControl, , MyStatResultField, %MyResult%
}
Gui -Disabled
Gui, Flash
Gui, Flash, Off
FileDelete, Result.txt
return
;---------------End-gender.exe related----------------------------------------

;---------------Hotkey related------------------------------------------------
ActivateHotkey:
Gui, Submit, Nohide
if (UseHotkey = 1)
{
   Hotkey, !g, CheckSelectedforGender, ON

}
if (UseHotkey = 0)
{
   Hotkey, !g, CheckSelectedforGender, OFF
}
return
;---------------End-Hotkey related--------------------------------------------

;---------------Gui related---------------------------------------------------
GuiSize:
if (A_EventInfo != 1)
{
   if (A_GuiWidth < 290)
      Gui, Show, w290
   if (A_GuiHeight < 260)
      Gui, Show, h260
}
GuiControl, Move, MyTab, % "w" A_GuiWidth-6 "h" A_GuiHeight-6
GuiControl, Move, MyResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-115
GuiControl, Move, UseHotkey, % "y" A_GuiHeight-25
GuiControl, Move, MyNickResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-130
GuiControl, Move, MyCountryResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-100
GuiControl, Move, MyStatResultField, % "w" A_GuiWidth-15 "h" A_GuiHeight-75
return

GuiClose:
ExitApp

;---------------End-Gui related-----------------------------------------------


The exe- and the ahk-file can be downloaded here: http://www.autohotkey.net/~Zed_Gecko/ge ... ender2.zip


Last edited by Zed Gecko on January 9th, 2009, 6:38 pm, edited 2 times in total.

Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 9th, 2009, 6:16 pm 
Offline

Joined: September 23rd, 2006, 1:58 pm
Posts: 149
The Author of gender.exe has published a new version.
You can download it
from http://www.heise.de/ct/ftp/07/17/182/
or from http://www.autohotkey.net/~Zed_Gecko/gender/0717-182.zip

WinGender was updated to work with the new version. (see link and code above)

_________________
code removed due to protest.
http://www.autohotkey.com/forum/viewtopic.php?t=81795


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: January 10th, 2009, 1:25 pm 
Offline

Joined: May 27th, 2007, 9:41 am
Posts: 4999
Very useful, thanks.

_________________
AHK FAQ
TF : Text files & strings lib, TF Forum


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: MSN [Bot] and 2 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group