Mask sensitive data with NXLog Agent

Log records may include sensitive data, such as personally identifiable information (PII) and credit card details. You might need to mask this data for security reasons to protect your users/customers or comply with data protection laws and regulations such as GDPR.

Using NXLog Agent’s Go, Java, Perl, Python, and Ruby modules, you can mask data with an external script. Below, we provide a Python script for masking sensitive data in log records, which you can use with NXLog Agent’s xm_python extension module. It consists of the following functions:

convert_host()

Hashes the value of the $Hostname field if it’s available.

ipv4_encoding()

Searches the $Message field for IPv4 patterns and hashes any matching values.

pass_lib_encoding()

Searches the $Message field for a matching string and processes the field further for a matching pattern if found. The current implementation searches for the cc string, and if found, it continues to search for and hash MasterCard debit or credit card numbers.

hash_lib_encoding()

An alternative implementation of pass_lib_encoding() using the native hashlib Python module.

hasher.py
import nxlog
import re
import base64
import hashlib
import uuid
from passlib.hash import pbkdf2_sha256

def regex_convert(content, controller):
    if controller == "cc":
        result = re.findall(r"(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}", content)
    elif controller == "ip":
        result = re.findall(r"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)", content)
    elif controller == "email":
        result = re.findall(r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)", content)
        # In case the above does not work, the regex below is 99.99% guaranteed
        # to find an email address. This is explained here: https://emailregex.com/
        # result = re.search("(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])",email)
    return result

def convert_host(event):
    module = event.module
    if 'Hostname' in event.field_names():
        host = event.get_field('Hostname')
        event.set_field('Hostname', pbkdf2_sha256.hash(host))

def ipv4_encoding(event):
    module = event.module
    if 'Message' in event.field_names():
        message = event.get_field('Message')
        check_ip = regex_convert(message, "ip")
        if bool(check_ip) == True:
            for ip_value in check_ip:
                hashed_field = pbkdf2_sha256.hash(ip_value)
                message = message.replace(ip_value, hashed_field)
            event.set_field('Message', message)

def pass_lib_encoding(event):
    module = event.module
    if 'Message' in event.field_names():
        message = event.get_field('Message')
        if 'cc' in message:
            check_result = regex_convert(message, "cc")
            if bool(check_result) == True:
                for cc_value in check_result:
                    hashed_field = pbkdf2_sha256.hash(cc_value)
                    message = message.replace(cc_value, hashed_field)
                event.set_field('Message', message)

# Alternative method of hashing data in messages captured by NXLog Agent
# through a native library
def hash_lib_encoding(event):
    module = event.module
    if 'Message' in event.field_names():
        message = event.get_field('Message')
        if 'cc' in message:
            check_result = regex_convert(message, "cc")
            if bool(check_result) == True:
                for cc_value in check_result:
                    salt = uuid.uuid4().hex
                    encoded_result = cc_value.encode('ascii')
                    base64_bytes = base64.b64encode(encoded_result)
                    base64_message = base64_bytes.decode('ascii')
                    sha256_message = hashlib.sha256(base64_bytes).hexdigest()
                    salted_sha256 = hashlib.sha256((salt + base64_message).encode('ascii')).hexdigest()
                    message = message.replace(cc_value, salted_sha256)
                event.set_field('Message', message)

You can also find this script in our public git repository.

This script is provided "AS IS" without warranty of any kind, either expressed or implied. Use at your own risk.

Mask hostnames

The above script can mask the $Hostname field. If your log records use a different field for the hostname, change the field name in the script’s convert_host() function.

Example 1. Masking the hostname field

This configuration reads syslog messages from a file and parses them into structured data using the parse_syslog() procedure of the xm_syslog module. This procedure populates the $Hostname field, required by the convert_host() function, with the hostname parsed from the syslog message.

It then uses the python_call() procedure of the xm_python module to mask the hostname with the convert_host() function provided by the script. Finally, it converts the record to JSON format using the xm_json module.

nxlog.conf
<Extension json>
    Module        xm_json
</Extension>

<Extension syslog>
    Module        xm_syslog
</Extension>

<Extension python>
    Module        xm_python
    PythonCode    '/path/to/hasher.py'
</Extension>

<Input messages>
    Module        im_file
    File          '/var/log/syslog'
    <Exec>
        parse_syslog();
        python_call('convert_host');
        to_json();
    </Exec>
</Input>
Input sample

The following is a syslog message collected from a Linux host.

Oct 14 16:17:48 NXLog-Server-1 systemd[1]: Started NXLog daemon.
Output sample

The following JSON object shows the same log record after NXLog Agent processed it.

{
  "EventReceivedTime": "2023-10-14T16:18:54.822255+02:00",
  "SourceModuleName": "messages",
  "SourceModuleType": "im_file",
  "SyslogFacilityValue": 1,
  "SyslogFacility": "USER",
  "SyslogSeverityValue": 5,
  "SyslogSeverity": "NOTICE",
  "SeverityValue": 2,
  "Severity": "INFO",
  "Hostname": "$pbkdf2-sha256$29000$xNibc07JeU.JUSplTAkB4A$zs7SqFzo1gKu.gm8cDckjq2EM9Nn9QSdolsSMPi/B8c",
  "EventTime": "2023-10-14T16:17:48.000000+02:00",
  "SourceName": "systemd",
  "ProcessID": 1,
  "Message": "Started NXLog daemon."
}

Mask IP addresses

The above script can mask IP addresses in the $Message field. Add or change field names in the script’s ipv4_encoding() function to process a different field.

Example 2. Masking IP addresses in the message field

This configuration reads log records in JSON format from a file and parses them into structured data using the parse_json() procedure of the xm_json module. This procedure populates the $Message field required by the ipv4_encoding() function.

It then uses the python_call() procedure of the xm_python module to mask any IP addresses in the message with the ipv4_encoding() function provided by the script. Finally, it converts the record back to JSON format.

nxlog.conf
<Extension json>
    Module        xm_json
</Extension>

<Extension python>
    Module        xm_python
    PythonCode    '/path/to/hasher.py'
</Extension>

<Input security_log>
    Module        im_file
    File          '/path/to/security/log'
    <Exec>
        parse_json();
        python_call('ipv4_encoding');
        to_json();
    </Exec>
</Input>
Input sample

The following is a log record in JSON format, containing a $Message field with two instances of IPv4 addresses.

{
  "EventTime": "Sat Oct 14 16:50:30 2023",
  "EventType": "Security Warning",
  "Message": "Error: Protocol error (-21), Protocol switch to TCP rejected, close connection (HTTP status code 403, Forbidden) [http_plg.c 5678] local host:  192.168.1.120:12345 () remote host: 192.168.1.122:62722 ()"
}
Output sample

The following JSON object shows the same log record after NXLog Agent processed it.

{
  "EventReceivedTime": "2023-10-14T16:51:26.549994+02:00",
  "SourceModuleName": "input_file",
  "SourceModuleType": "im_file",
  "EventTime": "Sat Oct 14 16:50:30 2023",
  "EventType": "Security Warning",
  "Message": "Error: Protocol error (-21), Protocol switch to TCP rejected, close connection (HTTP status code 403, Forbidden) [http_plg.c 5678] local host:  $pbkdf2-sha256$29000$9t4bg5DSupdyLqW09p7TOg$8vMWXRO5vT9HrwoHnYZgVUMc/Dgk8IPxkclcZtF25YY:12345 () remote host: $pbkdf2-sha256$29000$FQJg7L3XGoPQmhMipDSG8A$/lJ1/iSMVwHIsEnbLBxV7HE7EGvuHDvnhQpvLZ8F4nA:62722 ()"
}

Mask credit card numbers

The above script can mask MasterCard debit and credit card numbers in $Message fields containing the text cc. Change the script’s pass_lib_encoding() function to customize the field name or identifying text. You can change or add credit card providers by modifying the script’s regex_convert() function.

Example 3. Masking MasterCard debit and credit card numbers

This configuration reads SQL audit log records from a file and populates the $Message field required by the pass_lib_encoding() function.

It then uses the python_call() procedure of the xm_python module to mask any credit card numbers in the message with the pass_lib_encoding() function provided by the script. Finally, it updates the value of the $raw_event field with the updated message.

nxlog.conf
<Extension python>
    Module        xm_python
    PythonCode    '/path/to/hasher.py'
</Extension>

<Input audit_log>
    Module        im_file
    File          '/path/to/audit/log'
    <Exec>
        $Message = $raw_event;
        python_call('pass_lib_encoding');
        $raw_event = $Message;
    </Exec>
</Input>
Input sample

The following is a SQL audit log record containing credit card details.

2023-10-14 17:54:32|TERMINAL1|User1|cc_details|INSERT INTO cc_details VALUES(1234, 5577000055770004, 2026, 326, 2023, 4, 'John', 'Doe', '123 8th Avenue', 'New York', 'NY', '10019', 'US');
Output sample

The following is the same log record after NXLog Agent processed it.

2023-10-14 17:54:32|TERMINAL1|User1|cc_details|INSERT INTO cc_details VALUES(1234, $pbkdf2-sha256$29000$ZIzRGuOcs3aOUapVSgmh9A$g./AOB07kTCEzYwBlcs822APRXr5swHEztZhaAcGrFA, 2026, 326, 2023, 4, 'John', 'Doe', '123 8th Avenue', 'New York', 'NY', '10019', 'US');