BoBo Guest
|
Posted: Thu Jun 29, 2006 11:17 pm Post subject: UniConv - Convert unicode [CMD] |
|
|
| Quote: | The 32-bit binary of Basis Technology's UNICONV utility, which converts between most East Asian code-sets and Unicode.
uniconv.inf (2633 bytes) Some brief explanation from the "help" printouts.
[uniconv.txt] (7037 bytes) The words from the "help" screen.
[uniconv.zip] (726772 bytes) The Windows NT/95/98 binary and DLL.
uniconv_old.exe (835072 bytes) The previous Windows version (no DLL)
The Sun (Solaris 2.5), HPUX and Macintosh binaries are available too.
Uniconv Help
------------------------------------------------------------------------
Uniconv is a command line utility that uses the Basis Technology C++
Library for Unicode for converting text between encodings and optionally
applying transforms to it.
Usage :
Uniconv will convert a text file written in a given encoding (click here
for accepted encodings) to another of its accepted encodings. It uses a
command line interface, the usage being as follows:
uniconv [-options] <input-encoding> <input-file> <output-encoding>
<output-file>
[property | transform]*
uniconv
Name of the program to run.
input-encoding required
List the encoding of the input file. Encoding name must be
written in the way listed below.
input-file required
List the name of the file (if in the current directory) or the
path and file name of the file (if not in the current directory)
to be converted.
output-encoding required
List the desired encoding of the ouput file. Encoding name must
be written in the way listed below.
output-file required
List the name of the file to be created in the new encoding (if
in the current directory) or the path and file name of the new
file (if not in the current directory).
property optional
Returns true or false value for characters. A property is
associated with the transform that follows it. Properties not
followed by a transform are ignored. Multiple property-transform
pairs are OK. Multiple properties per transform are also OK. See
Character Properties for more information about how to use
properties, and see below for a quick reference of the
properties available.
transform optional
Changes a property value for designated characters in a file.
Multiple transforms are OK. See Transforms for more information
about how to use transforms, and see below for a quick reference
of the transforms available.
options:
Use these flags at the beginning of the command line, before you
specify the input and output encodings and filenames.
-debug optional
This option will print messages generated by Auto-detect. For
example, if you are converting a Japanese file and the input
encoding is japaneseautodetect, uniconv will list the encodings
it is attempting (sjis, euc-j, etc.) and the results.
-help optional
Displays the copyright information.
-subst optional
Allows you to change the default substitution character. The
substitution character is the character that is used if there is
no direct mapping between characters in a conversion. The
default substitution character is CTRL-Z.
Notes
- All command line arguments are case insensitive.
- Separate properties and transforms with a space.
- If there are multiple properties or transforms, they will be
performed in the order listed.
- The options -debug, -help, -subst, if used, must directly
follow "uniconv".
- * means more than one property or transform is OK.
Encodings :
Quick Reference: Accepted Encodings
Arabic, ASCII, Big5, BMP, ChineseAutoDetect, cp1251, cp1252, cp437, cp850,
EUC-J, EUC-KR, GB2312, Greek, Hebrew, HZ, ISO-2022-JP, ISO-2022-KR,
ISOLatinCyrillic, JapaneseAutoDetect, JIS_X0201, JIS_X_0208,
KoreanAutoDetect, Latin1, Latin2, Latin3, Latin4, Latin5, Latin6,
Shift-JIS, Thai, UCS2, Unicode11UCS2, Unicode11UTF7, Unicode11UTF8, UTF7,
UTF8
Accepted Properties:
UppercaseLetter, LowercaseLetter, TitlecaseLetter, ModifierLetter,
OtherLetter, AnyLetter, NonSpacingMark, CombiningMark, DecimalNumber,
OtherNumber, DashPunctuation, OpenPunctuation, ClosePunctuation,
OtherPunctuation, MathSymbol, CurrencySymbol, OtherSymbol, SpaceSeparator,
LineSeparator, ParagraphSeparator, ControlCharacter, OtherCharacter,
UndefinedScript, GeneralScript, Latin, Greek, Cyrillic, Armenian, Hebrew,
Arabic, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu,
Kannada, Malayalam, Thai, Lao, Tibetan, Georgian, HangulJamo, Hiragana,
Katakana, Kana, Bopomofo, CJKUnifiedIdeographs, Hangul, UndefinedWidth,
Fullwidth, Halfwidth
Accepted Transforms :
ToLowercase, ToUppercase, ToFullwidth, ToHalfwidth, ToHiragana,
ToKatakana, Decompose, Compose, ToCombiningMark, ToSpacingMark, Select,
Filter, ToCRLF, ToCR, ToLF, ToParagraphSeparator, ToLineSeparator,
ToCanonical, ToTraditionalChinese, ToSimplifiedChinese, RomajiToHiragana,
RomajiToKatakana, KanaToRomaji, ToLatinNumber, SGMLEntity |
|
|