Data masking

Log records may include sensitive data, such as personally identifiable information (PII) and credit card details. This data often needs to be masked for security reasons and to protect your users/customers. Data masking is also a requirement for compliance with data protection laws and regulations such as GDPR. NXLog can modify log records as can be seen in Rewriting and modifying logs, and can mask data by executing functions from an external script with support for Go, Java, Perl, Python, and Ruby.

Masking data with an external script

An example Python script for masking data is available in our public git repository. The following examples show how this script can be used with NXLog to mask sensitive data.

Example 1. Masking hostnames

This example reads syslog messages from file and masks the hostname by using the convert_host() function provided by the script. Records are parsed into structured data using the parse_syslog() procedure of the xm_syslog module. This procedure populates the $Hostname field, which is required by the convert_host() function, with the hostname parsed from the syslog message. Finally, the record is converted to JSON format using the xm_json module.

nxlog.conf
<Extension json>
    Module        xm_json
</Extension>

<Extension syslog>
    Module        xm_syslog
</Extension>

<Extension python>
    Module        xm_python
    PythonCode    /opt/nxlog/etc/hasher.py
</Extension>

<Input messages>
    Module        im_file
    File          '/var/log/syslog'
    <Exec>
        parse_syslog();
        python_call('convert_host');
        to_json();
    </Exec>
</Input>
Input sample

The following is a syslog message collected from a Linux host.

Jul 29 16:17:48 NXLog-Server-1 systemd[1]: Started NXLog daemon.
Output sample

The following JSON object shows the same log record after it was processed by NXLog.

{
  "EventReceivedTime": "2021-07-29T16:18:54.822255+02:00",
  "SourceModuleName": "messages",
  "SourceModuleType": "im_file",
  "SyslogFacilityValue": 1,
  "SyslogFacility": "USER",
  "SyslogSeverityValue": 5,
  "SyslogSeverity": "NOTICE",
  "SeverityValue": 2,
  "Severity": "INFO",
  "Hostname": "$pbkdf2-sha256$29000$xNibc07JeU.JUSplTAkB4A$zs7SqFzo1gKu.gm8cDckjq2EM9Nn9QSdolsSMPi/B8c",
  "EventTime": "2021-07-29T16:17:48.000000+02:00",
  "SourceName": "systemd",
  "ProcessID": 1,
  "Message": "Started NXLog daemon."
}
Example 2. Masking IPs

This example reads log records in JSON format from file and masks IPv4 addresses using the ipv4_encoding() function provided by the script. This function checks the $Message field for the IPv4 pattern and masks any matching values. To populate the $Message field, records are parsed into structured data using the parse_json() procedure of the xm_json module. Once processed, the updated record is then converted back to JSON format.

nxlog.conf
<Extension json>
    Module        xm_json
</Extension>

<Extension python>
    Module        xm_python
    PythonCode    /opt/nxlog/etc/hasher.py
</Extension>

<Input security_log>
    Module        im_file
    File          '/path/to/security/log'
    <Exec>
        parse_json();
        python_call('ipv4_encoding');
        to_json();
    </Exec>
</Input>
Input sample

The following is a log record in JSON format, containing a $Message field with two instances of IPv4 addresses.

{
  "EventTime": "Wed Jul 29 16:50:30 2021",
  "EventType": "Security Warning",
  "Message": "Error: Protocol error (-21), Protocol switch to TCP rejected, close connection (HTTP status code 403, Forbidden) [http_plg.c 5678] local host:  192.168.1.120:12345 () remote host: 192.168.1.122:62722 ()"
}
Output sample

The following JSON object shows the same log record after it was processed by NXLog.

{
  "EventReceivedTime": "2021-07-29T16:51:26.549994+02:00",
  "SourceModuleName": "input_file",
  "SourceModuleType": "im_file",
  "EventTime": "Wed Jul 29 16:50:30 2021",
  "EventType": "Security Warning",
  "Message": "Error: Protocol error (-21), Protocol switch to TCP rejected, close connection (HTTP status code 403, Forbidden) [http_plg.c 5678] local host:  $pbkdf2-sha256$29000$9t4bg5DSupdyLqW09p7TOg$8vMWXRO5vT9HrwoHnYZgVUMc/Dgk8IPxkclcZtF25YY:12345 () remote host: $pbkdf2-sha256$29000$FQJg7L3XGoPQmhMipDSG8A$/lJ1/iSMVwHIsEnbLBxV7HE7EGvuHDvnhQpvLZ8F4nA:62722 ()"
}
Example 3. Masking credit card numbers

This example reads SQL audit log records from file and masks credit card numbers using the pass_lib_encoding() function provided by the script. This function checks the $Message field for the text cc. If found, it continues to check for MasterCard debit or credit card numbers and masks any matching values. The configuration populates the $Message field with the content of the $raw_event field. Once processed, the updated text is then written back to the $raw_event field.

nxlog.conf
<Extension python>
    Module        xm_python
    PythonCode    /opt/nxlog/etc/hasher.py
</Extension>

<Input audit_log>
    Module        im_file
    File          '/path/to/audit/log'
    <Exec>
        $Message = $raw_event;
        python_call('pass_lib_encoding');
        $raw_event = $Message;
    </Exec>
</Input>
Input sample

The following is a SQL audit log record containing credit card details.

2021-07-29 17:54:32|TERMINAL1|User1|cc_details|INSERT INTO cc_details VALUES(1234, 5577000055770004, 2024, 326, 2023, 4, 'John', 'Doe', '123 8th Avenue', 'New York', 'NY', '10019', 'US');
Output sample

The following is the same log record after it was processed by NXLog.

2021-07-29 17:54:32|TERMINAL1|User1|cc_details|INSERT INTO cc_details VALUES(1234, $pbkdf2-sha256$29000$ZIzRGuOcs3aOUapVSgmh9A$g./AOB07kTCEzYwBlcs822APRXr5swHEztZhaAcGrFA, 2024, 326, 2023, 4, 'John', 'Doe', '123 8th Avenue', 'New York', 'NY', '10019', 'US');