Data masking
Log records may include sensitive data, such as personally identifiable information (PII) and credit card details. This data often needs to be masked for security reasons and to protect your users/customers. Data masking is also a requirement for compliance with data protection laws and regulations such as GDPR. NXLog can modify log records as can be seen in Rewriting and modifying logs, and can mask data by executing functions from an external script with support for Go, Java, Perl, Python, and Ruby.
Masking data with an external script
An example Python script for masking data is available in our public git repository. The following examples show how this script can be used with NXLog to mask sensitive data.
This example reads syslog messages from file and masks the hostname by using
the convert_host()
function provided by the script. Records are parsed into
structured data using the parse_syslog()
procedure of the xm_syslog module. This procedure populates the $Hostname
field, which is required by the convert_host()
function, with the hostname
parsed from the syslog message. Finally, the record is converted to JSON
format using the xm_json module.
<Extension json>
Module xm_json
</Extension>
<Extension syslog>
Module xm_syslog
</Extension>
<Extension python>
Module xm_python
PythonCode /opt/nxlog/etc/hasher.py
</Extension>
<Input messages>
Module im_file
File '/var/log/syslog'
<Exec>
parse_syslog();
python_call('convert_host');
to_json();
</Exec>
</Input>
The following is a syslog message collected from a Linux host.
Jul 29 16:17:48 NXLog-Server-1 systemd[1]: Started NXLog daemon.
The following JSON object shows the same log record after it was processed by NXLog.
{
"EventReceivedTime": "2021-07-29T16:18:54.822255+02:00",
"SourceModuleName": "messages",
"SourceModuleType": "im_file",
"SyslogFacilityValue": 1,
"SyslogFacility": "USER",
"SyslogSeverityValue": 5,
"SyslogSeverity": "NOTICE",
"SeverityValue": 2,
"Severity": "INFO",
"Hostname": "$pbkdf2-sha256$29000$xNibc07JeU.JUSplTAkB4A$zs7SqFzo1gKu.gm8cDckjq2EM9Nn9QSdolsSMPi/B8c",
"EventTime": "2021-07-29T16:17:48.000000+02:00",
"SourceName": "systemd",
"ProcessID": 1,
"Message": "Started NXLog daemon."
}
This example reads log records in JSON format from file and masks IPv4 addresses
using the ipv4_encoding()
function provided by the script. This function
checks the $Message
field for the IPv4 pattern and masks any matching values.
To populate the $Message
field, records are parsed into structured data using
the parse_json() procedure of the xm_json
module. Once processed, the updated record is then converted back to JSON
format.
<Extension json>
Module xm_json
</Extension>
<Extension python>
Module xm_python
PythonCode /opt/nxlog/etc/hasher.py
</Extension>
<Input security_log>
Module im_file
File '/path/to/security/log'
<Exec>
parse_json();
python_call('ipv4_encoding');
to_json();
</Exec>
</Input>
The following is a log record in JSON format, containing a $Message
field
with two instances of IPv4 addresses.
{
"EventTime": "Wed Jul 29 16:50:30 2021",
"EventType": "Security Warning",
"Message": "Error: Protocol error (-21), Protocol switch to TCP rejected, close connection (HTTP status code 403, Forbidden) [http_plg.c 5678] local host: 192.168.1.120:12345 () remote host: 192.168.1.122:62722 ()"
}
The following JSON object shows the same log record after it was processed by NXLog.
{
"EventReceivedTime": "2021-07-29T16:51:26.549994+02:00",
"SourceModuleName": "input_file",
"SourceModuleType": "im_file",
"EventTime": "Wed Jul 29 16:50:30 2021",
"EventType": "Security Warning",
"Message": "Error: Protocol error (-21), Protocol switch to TCP rejected, close connection (HTTP status code 403, Forbidden) [http_plg.c 5678] local host: $pbkdf2-sha256$29000$9t4bg5DSupdyLqW09p7TOg$8vMWXRO5vT9HrwoHnYZgVUMc/Dgk8IPxkclcZtF25YY:12345 () remote host: $pbkdf2-sha256$29000$FQJg7L3XGoPQmhMipDSG8A$/lJ1/iSMVwHIsEnbLBxV7HE7EGvuHDvnhQpvLZ8F4nA:62722 ()"
}
This example reads SQL audit log records from file and masks credit card numbers
using the pass_lib_encoding()
function provided by the script. This function
checks the $Message
field for the text cc
. If found, it continues to
check for MasterCard debit or credit card numbers and masks any matching values.
The configuration populates the $Message
field with the content of the
$raw_event
field. Once processed, the updated text is then written back to
the $raw_event
field.
<Extension python>
Module xm_python
PythonCode /opt/nxlog/etc/hasher.py
</Extension>
<Input audit_log>
Module im_file
File '/path/to/audit/log'
<Exec>
$Message = $raw_event;
python_call('pass_lib_encoding');
$raw_event = $Message;
</Exec>
</Input>
The following is a SQL audit log record containing credit card details.
2021-07-29 17:54:32|TERMINAL1|User1|cc_details|INSERT INTO cc_details VALUES(1234, 5577000055770004, 2024, 326, 2023, 4, 'John', 'Doe', '123 8th Avenue', 'New York', 'NY', '10019', 'US');
The following is the same log record after it was processed by NXLog.
2021-07-29 17:54:32|TERMINAL1|User1|cc_details|INSERT INTO cc_details VALUES(1234, $pbkdf2-sha256$29000$ZIzRGuOcs3aOUapVSgmh9A$g./AOB07kTCEzYwBlcs822APRXr5swHEztZhaAcGrFA, 2024, 326, 2023, 4, 'John', 'Doe', '123 8th Avenue', 'New York', 'NY', '10019', 'US');