Character Set Conversion (xm_charconv)

This module provides tools for converting strings between different character sets (code pages). All the encodings available to iconv are supported. On GNU/Linux systems execute iconv -l for a list of encoding names.

The functionality of xm_charconv can be combined with other modules providing data conversion such as xm_crypto or xm_zlib.

To examine the supported platforms, see the list of installation packages.

Configuration

The xm_charconv module accepts the following directives in addition to the common module directives.

Optional directives

AutodetectCharsets

This optional directive accepts a comma-separated list of character set names. When auto is specified as the source encoding for convert() or convert_fields(), these character sets will be tried for conversion. This directive is not related to the InputEncoding or OutputEncoding directives or the corresponding data converter.

InputEncoding

If this optional directive is specified with an encoding, a data converter will be registered to convert from the specified encoding. If this directive is not specified, it defaults to UTF-8.

OutputEncoding

If this optional directive is specified with an encoding, a data converter will be registered to convert to tghe specified encoding. If this directive is not specified, it defaults to UTF-8.

Functions

The following functions are exported by xm_charconv.

string convert(string source, string srcencoding, string dstencoding)

Convert the source string to the encoding specified in dstencoding from srcencoding. The srcencoding argument can be set to auto to request auto detection.

Procedures

The following procedures are exported by xm_charconv.

convert_fields(string srcencoding, string dstencoding);

Convert all string type fields of a log message from srcencoding to dstencoding. The srcencoding argument can be set to auto to request auto detection.

Data conversion

The xm_charconv module implements a data converter to be used with the im_file module. It is specified in the InputType directive of im_file module and is invoked using dot notation:

<InstanceName>.<DataConverterName>

Where <InstanceName> is the given name of the xm_charconv instance and <DataConverterName> is the name of the converter being invoked.

The following data converter is available:

convert

This data converter is used to convert data from encoding specified in InputEncoding to encoding specified in OutputEncoding. The converter should be specified in the InputType directive before the input reader function.

Examples

Example 1. Character set auto-detection of various input encodings

This configuration shows an example of character set auto-detection. The input file can contain lines with different encodings, and the module normalizes output to UTF-8.

nxlog.conf
<Extension converter>
    Module              xm_charconv
    AutodetectCharsets  utf-8, euc-jp, utf-16, utf-32, iso8859-2
</Extension>

<Input filein>
    Module              im_file
    File                "tmp/input"
    Exec                convert_fields("auto", "utf-8");
</Input>
Example 2. Registering and using a data converter in an input module

This configuration uses the data converter registered via the InputEncoding directive to read a file with the ISO-8859-2 encoding.

nxlog.conf
<Extension converter>
    Module          xm_charconv
    InputEncoding   ISO-8859-2
</Extension>

<Input filein>
    Module          im_file
    File            "tmp/input/iso-8859-2.in"
    InputType       converter.convert
</Input>
Example 3. Using data converter with other InputType

This configuration uses a data converter with xm_multiline as an InputType to read a file with UCS-2BE encoding. Each log record in this file spans 3 lines.

nxlog.conf
<Extension converter>
    Module          xm_charconv
    InputEncoding   UCS-2BE
</Extension>

<Extension multiline>
    Module          xm_multiline
    FixedLineCount  3
</Extension>

<Input filein>
    Module          im_file
    File            'tmp/input/ucs-2be.in'
    InputType       converter.convert, multiline
</Input>
Example 4. Registering and using a data converter in an output module

This configuration uses the data converter registered via the OutputEncoding directive to store log data into a file with the ISO-8859-2 encoding.

nxlog.conf
<Extension converter>
    Module          xm_charconv
    OutputEncoding  ISO-8859-2
</Extension>

<Input filein>
    Module          im_file
    File            "tmp/input/utf-8.in"
</Input>

<Output fileout>
    Module          om_file
    File            "tmp/iso-8859-2.out"
    OutputType      converter.convert
</Output>