Character Set Conversion (xm_charconv)

This module provides tools for converting strings between different character sets (code pages). All the encodings available to iconv are supported. On GNU/Linux systems execute iconv -l for a list of encoding names.

To examine the supported platforms, see the list of installer packages in the Available Modules chapter.

Configuration

The xm_charconv module accepts the following directives in addition to the common module directives.

AutodetectCharsets: This optional directive accepts a comma-separated list of character set names. When auto is specified as the source encoding for convert() or convert_fields(), these character sets will be tried for conversion. This directive is not related to the LineReader directive or the corresponding InputType registered when LineReader is specified.

BigEndian: This optional boolean directive specifies the endianness to use during the encoding conversion. If this directive is not specified, it defaults to the host’s endianness. This directive only affects the registered InputType, and is only applicable if the LineReader directive is set to a non-Unicode encoding and the CharBytes directive is set to 2 or 4.

CharBytes: This optional integer directive specifies the byte-width of the encoding to use during conversion. Accepted values are 1 (the default), 2, and 4. Most variable width encodings will work with the default value. This directive only affects the registered InputType and is only applicable if the LineReader directive is set to a non-Unicode encoding.

LineReader: If this optional directive is specified with an encoding, an InputType will be registered using the name of the xm_charconv module instance. The following Unicode encodings are supported: UTF-8, UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, and UTF-7. For other encodings, it may be necessary to also set the BigEndian and/or CharBytes directives.

Functions

The following functions are exported by xm_charconv.

string convert(string source, string srcencoding, string dstencoding): Convert the source string to the encoding specified in dstencoding from srcencoding. The srcencoding argument can be set to auto to request auto detection.

Procedures

The following procedures are exported by xm_charconv.

convert_fields(string srcencoding, string dstencoding);: Convert all string type fields of a log message from srcencoding to dstencoding. The srcencoding argument can be set to auto to request auto detection.

Examples

Example 1. Character set auto-detection of various input encodings

This configuration shows an example of character set auto-detection. The input file can contain lines with different encodings, and the module normalizes output to UTF-8.

nxlog.conf

<Extension converter>
    Module              xm_charconv
    AutodetectCharsets  utf-8, euc-jp, utf-16, utf-32, iso8859-2
</Extension>

<Input filein>
    Module              im_file
    File                "tmp/input"
    Exec                convert_fields("auto", "utf-8");
</Input>

Example 2. Registering and using an InputType

This configuration uses the InputType registered via the LineReader directive to read a file with the ISO-8859-2 encoding.

nxlog.conf

<Extension converter>
    Module      xm_charconv
    LineReader  ISO-8859-2
</Extension>

<Input filein>
    Module      im_file
    File        "tmp/input/iso-8859-2.in"
    InputType   converter
</Input>