Convert character sets with NXLog Agent

Sometimes, you might need to convert data between different character sets, for example, if collecting events from UTF-16-encoded files and your SIEM requires UTF-8 encoding. You can convert between character sets with NXLog Agent’s xm_charconv module, which allows you to configure the input and output encodings and provides functions to detect and convert character sets.

Below, we provide examples of using the xm_charconv module to convert the character encoding of your data.

Auto-detect input encoding

If you have multiple sources producing telemetry data in different character sets that you want to streamline into a single encoding, you can use the AutodetectCharsets directive combined with the convert_fields() procedure.

Example 1. Auto-detect and convert character sets

This configuration uses an xm_charconv instance and specifies a list of character sets that input data might use. It then converts all text fields in each record to UTF-8 with the convert_fields() procedure, specifying auto for the input encoding.

nxlog.conf
<Extension charconv>
    Module                xm_charconv
    AutodetectCharsets    utf-8, utf-16, utf-32, iso8859-2, iso8859-7, iso-8859-1, euc-jp
</Extension>

<Input input_file>
    Module                im_file
    File                  '/path/to/logs/*'
    Exec                  convert_fields("auto", "utf-8");
</Input>

Convert a specific character set

If you want to convert data between specific character sets, using the InputEncoding and OutputEncoding directives to register input reader and output writer functions is the easiest.

Example 2. Convert a specific character set to UTF-8

This configuration uses an xm_charconv instance with the input encoding set to shift-jis, a character set for the Japanese language. It then specifies the InputType of the im_file instance to shift_jis, i.e., the name of the xm_charconv instance.

Since it does not explicitly set the OutputEncoding, it will output data in the default UTF-8 character set.

nxlog.conf
<Extension shift_jis>
    Module           xm_charconv
    InputEncoding    shift-jis
</Extension>

<Input input_file>
    Module           im_file
    File             '/path/to/logs/*'
    InputType        shift_jis.convert
</Input>