Delimiter-Separated Values (xm_csv)
This module provides functions and procedures for working with data formatted as comma-separated values (CSV). CSV input can be parsed into fields whilst output can be generated as CSV. Delimiters other than the comma are also supported.
It is possible to use more than one xm_csv module instance with different options to support different CSV formats in the same configuration file. For this reason, functions and procedures exported by this module are public and must be referenced by the module instance name. |
Configuration
The xm_csv module accepts the following directives in addition to the common module directives. The Fields directive is required.
Required directives
The following directives are required for the module to start.
This mandatory directive accepts a comma-separated list of fields which will be populated when parsing the input data.
Field names with or without the dollar sign ( |
Optional directives
This optional directive specifies the character to be used as the delimiter to separate fields.
See the section on Specifying Delimiter Characters for more information.
The default delimiter character is the comma ( |
|
This optional directive specifies the character to be used to escape special characters. See the section on Specifying Escape Characters for more information.
The escape character is used to prefix the following characters: the escape character itself, the quote character, and the delimiter character.
If EscapeControl is |
|
If this optional boolean directive is set to |
|
This optional directive specifies a list of field types corresponding to the field names defined in the Fields directive. If specified it must match the same number of fields. If this directive is omitted, all fields will be stored as strings. This directive does not effect the fields-to-CSV conversion. |
|
This optional directive specifies the character to be used to quote/enclose fields. See the section on Specifying Quote Characters for more information.
If QuoteOptional is |
|
This optional directive accepts the following values:
Note that this directive only affects CSV generation when using to_csv(). The CSV parser can automatically detect the quotation. |
|
This directive has been deprecated in favor of QuoteMethod, which should be used instead. |
|
If this optional boolean directive is set to |
|
This optional directive specifies a string that will be treated as an undefined value.
This is particularly useful when parsing the W3C format where the dash ( |
Specifying Quote, Escape, and Delimiter Characters
The QuoteChar, EscapeChar, and Delimiter directives can be specified in several ways.
- Unquoted single character
-
Any printable character can be specified as an unquoted character, except for the backslash (
\
):Delimiter ;
- Control characters
-
The following non-printable characters can be specified with escape sequences:
- \a
-
audible alert (bell)
- \b
-
backspace
- \t
-
horizontal tab
- \n
-
newline
- \v
-
vertical tab
- \f
-
formfeed
- \r
-
carriage return
For example, to use TAB delimiting:
Delimiter \t
- A character in single quotes
-
The configuration parser strips whitespace, so it is not possible to define a space as the delimiter unless it is enclosed within quotes:
Delimiter ' '
Printable characters can also be enclosed:
Delimiter ';'
The backslash can be specified when enclosed within quotes:
Delimiter '\'
- A character in double quotes
-
Double quotes can be used like single quotes:
Delimiter " "
The backslash can be specified when enclosed within double quotes:
Delimiter "\"
- A hexadecimal ASCII code
-
Hexadecimal ASCII character codes can be used prefixed with
0x
. For example, the space can be specified as:Delimiter 0x20
This is equivalent to:
Delimiter " "
Functions
The following functions are exported by xm_csv.
- string
to_csv()
-
Convert the specified fields to a single CSV formatted string.
Procedures
The following procedures are exported by xm_csv.
parse_csv();
-
Parse the
$raw_event
field as CSV input.
parse_csv(string source);
-
Parse the given string as CSV format.
to_csv();
-
Format the specified fields as CSV and put this into the
$raw_event
field.
Creating and populating fields
The parse_csv() procedure parses a string containing delimiter-separated values into structured data.
It expects the $raw_event
field or the string passed as a parameter to be in the following format:
value1,value2,value3
Once a log record is parsed with this procedure, additional fields are created based on the Fields directive. The fields can be used for further log processing or to convert the log record to a different output format. For an example of how to parse and process CSV log records, see Parsing CSV below.
Input modules may create additional fields containing various information. When converting to a different format, such fields will be included in the output log record, which may consume additional memory and bandwidth. For efficient handling of log records, consult the Fields section in the documentation of input modules and test the configuration before deployment. To delete any unwanted fields, use the delete() procedure or the xm_rewrite extension. |
Examples
This configuration uses the im_file input module to collect CSV logs from a file.
Log records are parsed into structured data using the parse_csv() procedure.
If the log record has a severity of INFO
, the record is dropped.
Otherwise, the log record is converted to JSON using the to_json() procedure.
<Extension csv>
Module xm_csv
Fields $EventTime, $Severity, $Message
Delimiter ,
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input file_input>
Module im_file
File 'path/to/log/file'
<Exec>
csv->parse_csv();
if ($Severity == "INFO")
drop();
to_json();
</Exec>
</Input>
2021-11-04T10:27:45.919858+03:00,ERROR,File not found
{
"EventReceivedTime": "2021-11-04T10:27:58.919858+03:00"
"SourceModuleName": "file_input",
"SourceModuleType": "im_file",
"EventTime": "2021-11-04T10:27:45.919858+03:00",
"Severity": "ERROR",
"Message": "File not found"
}
This example shows that the xm_csv module can not only parse and create CSV formatted input and output, but with multiple xm_csv module instances it is also possible to reorder, add, remove, or modify fields to output data in a different CSV format.
<Extension csv1>
Module xm_csv
Fields $id, $name, $number
FieldTypes integer, string, integer
Delimiter ,
</Extension>
<Extension csv2>
Module xm_csv
Fields $id, $number, $name, $date
Delimiter ;
</Extension>
<Input in>
Module im_file
File "tmp/input"
<Exec>
csv1->parse_csv();
$date = now();
if not defined $number $number = 0;
csv2->to_csv();
</Exec>
</Input>
<Output out>
Module om_file
File "tmp/output"
</Output>
1,"John K.",42
2,"Joe F.",43
1;42;"John K.";2011-01-15 23:45:20
2;43;"Joe F.";2011-01-15 23:45:20