Character Set Conversion (xm_charconv)
This module provides tools for converting strings between different character sets (code pages).
All the encodings available to iconv are supported.
On GNU/Linux systems execute iconv -l for a list of encoding names.
The functionality of xm_charconv can be combined with other modules providing data conversion such as xm_crypto or xm_zlib.
| To examine the supported platforms, see the list of installation packages. | 
Configuration
The xm_charconv module accepts the following directives in addition to the common module directives.
Optional directives
This optional directive accepts a comma-separated list of character set names.
When   | 
|
If this optional directive is specified with an encoding, a data converter will be registered to convert from the specified encoding. If this directive is not specified, it defaults to UTF-8.  | 
|
If this optional directive is specified with an encoding, a data converter will be registered to convert to tghe specified encoding. If this directive is not specified, it defaults to UTF-8.  | 
Data conversion
The xm_charconv module implements a data converter to be used with the im_file module. It is specified in the InputType directive of im_file module and is invoked using dot notation:
<InstanceName>.<DataConverterName>
Where <InstanceName> is the given name of the xm_charconv instance and
<DataConverterName> is the name of the converter being invoked.
The following data converter is available:
- convert
 - 
This data converter is used to convert data from encoding specified in InputEncoding to encoding specified in OutputEncoding. The converter should be specified in the InputType directive before the input reader function.
 
Examples
This configuration shows an example of character set auto-detection. The input file can contain lines with different encodings, and the module normalizes output to UTF-8.
<Extension converter>
    Module              xm_charconv
    AutodetectCharsets  utf-8, euc-jp, utf-16, utf-32, iso8859-2
</Extension>
<Input filein>
    Module              im_file
    File                "tmp/input"
    Exec                convert_fields("auto", "utf-8");
</Input>
This configuration uses the data converter registered via the InputEncoding directive to read a file with the ISO-8859-2 encoding.
<Extension converter>
    Module          xm_charconv
    InputEncoding   ISO-8859-2
</Extension>
<Input filein>
    Module          im_file
    File            "tmp/input/iso-8859-2.in"
    InputType       converter.convert
</Input>
This configuration uses a data converter with xm_multiline as an InputType to read a file with UCS-2BE encoding. Each log record in this file spans 3 lines.
<Extension converter>
    Module          xm_charconv
    InputEncoding   UCS-2BE
</Extension>
<Extension multiline>
    Module          xm_multiline
    FixedLineCount  3
</Extension>
<Input filein>
    Module          im_file
    File            'tmp/input/ucs-2be.in'
    InputType       converter.convert, multiline
</Input>
This configuration uses the data converter registered via the OutputEncoding directive to store log data into a file with the ISO-8859-2 encoding.
<Extension converter>
    Module          xm_charconv
    OutputEncoding  ISO-8859-2
</Extension>
<Input filein>
    Module          im_file
    File            "tmp/input/utf-8.in"
</Input>
<Output fileout>
    Module          om_file
    File            "tmp/iso-8859-2.out"
    OutputType      converter.convert
</Output>