Convert character sets with NXLog Agent
Sometimes, you might need to convert logs between different character sets, for example, if collecting records from UTF-16-encoded files and your SIEM requires UTF-8 encoding. You can convert between character sets with NXLog Agent’s xm_charconv module, which allows you to configure the input and output encodings and provides functions to detect and convert character sets.
Below, we provide examples of using the xm_charconv module to convert your logs' character encoding.
Auto-detect input encoding
If you have multiple sources producing logs in different character sets that you want to streamline into a single encoding, you can use the AutodetectCharsets directive combined with the convert_fields() procedure.
This configuration uses an xm_charconv instance and specifies a list of character sets that input logs might use.
It then converts all text fields in each record to UTF-8 with the convert_fields()
procedure, specifying auto
for the input encoding.
<Extension charconv>
Module xm_charconv
AutodetectCharsets utf-8, utf-16, utf-32, shift-jis, euc-jp
</Extension>
<Input input_file>
Module im_file
File '/path/to/logs/*'
Exec convert_fields("auto", "utf-8");
</Input>
Convert a specific character set
If you want to convert logs between specific character sets, using the InputEncoding and OutputEncoding directives to register input reader and output writer functions is the easiest.
This configuration uses an xm_charconv instance with the input encoding set to shift-jis
, a character set for the Japanese language.
It then specifies the InputType of the im_file instance to shift_jis
, i.e., the name of the xm_charconv instance.
Since it does not explicitly set the OutputEncoding
, it will output logs in the default UTF-8 character set.
<Extension shift_jis>
Module xm_charconv
InputEncoding shift-jis
</Extension>
<Input input_file>
Module im_file
File '/path/to/logs/*'
InputType shift_jis.convert
</Input>