Tlk2xml
BioWare TLK to XML converter
Synopsis
tlk2xml [<options>] <input file> [<output file>]
Description
tlk2xml converts BioWare's TLK files into human-readable XML. TLK are "talk tables", a list of strings indexed by an ID, used for all user-visible text in a BioWare game. All strings for a campaign or module are usually collected in one file for each supported language, and languages with sentences that vary wildly depending on whether the player character is male or female use a second TLK with strings for the female version.
There's two distinct TLK formats. One is a whole separate file format (which uses version IDs V3.0 and V4.0), the other is a GFF (and uses version IDs V0.2 and V0.5). Within those two major versions, the differences are smaller: V4.0 removed fields for each string not needed anymore, and V0.5 compresses strings using a Huffman tree. This tool can read all of these variants and produces a human-read XML file.
Because these files contain localized string data, it is important to know the encoding of those strings. Unfortunately, the TLK files do not contain information about the encoding. Version 3.0 and 4.0 contain a language identifier, but the meaning of that varies between games. V0.2 and V0.5 even lack those completely. However, due to the Huffman-nature of V0.5 strings, the encoding there is fixed to little-endian UTF-16, and strings in V0.2 files are also usually in little-endian UTF-16 (with the exceptions of files found in the Nintendo DS game Sonic Chronicles: The Dark Brotherhood). To manually select the encoding, this tool provides a wide range command line options for various encodings.
Alternatively, the game this TLK is from can be specified and tlk2xml will read the strings in an appropriate encoding for that game and the language ID found in the TLK. Please note that this does not work for the game Sonic Chronicles: The Dark Brotherhood, since its TLK files do not provide a language ID.
Options
-h
--help
- Show a help text and exit.
--version
- Show version information and exit.
--cp1250
- Read strings as Windows CP-1250. Eastern European, Latin alphabet.
--cp1251
- Read strings as Windows CP-1251. Eastern European, Cyrillic alphabet.
--cp1252
- Read strings as Windows CP-1252. Western European, Latin alphabet.
--cp932
- Read strings as Windows CP-932. Japanese, extended Shift-JIS.
--cp936
- Read strings as Windows CP-936. Simplified Chinese, extended GB2312 with GBK codepoints.
--cp949
- Read strings as Windows CP-949. Korean, similar to EUC-KR.
--cp950
- Read strings as Windows CP-950. Traditional Chinese, similar to Big5.
--utf8
- Read strings as UTF-8.
--utf16le
- Read strings as little-endian UTF-16.
--utf16be
- Read strings as big-endian UTF-16.
--nwn
- Read strings in an encoding appropriate for Neverwinter Nights.
--nwn2
- Read strings in an encoding appropriate for Neverwinter Nights 2.
--kotor
- Read strings in an encoding appropriate for Knights of the Old Republic.
--kotor2
- Read strings in an encoding appropriate for Knights of the Old Republic II: The Sith Lords.
--jade
- Read strings in an encoding appropriate for Jade Empire.
--witcher
- Read strings in an encoding appropriate for The Witcher.
--dragonage
- Read strings in an encoding appropriate for Dragon Age: Origins.
--dragonage2
- Read strings in an encoding appropriate for Dragon Age II.
<input file>
- The TLK file to convert.
[<output file>]
- The XML file will be written there. If no output file is specified, the XML data is written to stdout. The encoding of the XML stream is always UTF-8.
Examples
Convert the CP-1252 TLK file1.tlk into an XML file:
tlk2xml --cp1252 file1.tlk file2.xml
Convert the UTF-16LE TLK file1.tlk into an XML file on stdout:
tlk2xml --utf16le file1.tlk
Convert the TLK file1.tlk from Neverwinter Nights into an XML file:
tlk2xml --nwn file1.tlk file2.xml
Convert the UTF-8 TLK file1.tlk into an XML file on stdout, modify it using sed(1) and use xml2tlk(1) to write it back into a TLK:
tlk2xml --utf8 file1.tlk | sed -e 's/gold/candy/g' | xml2tlk --utf8 --version30 file2.tlk