Mask sensitive data with NXLog Agent
Log records may include sensitive data, such as personally identifiable information (PII) and credit card details. You might need to mask this data for security reasons to protect your users/customers or comply with data protection laws and regulations such as GDPR.
Using NXLog Agent’s Go, Java, Perl, Python, and Ruby modules, you can mask data with an external script. Below, we provide a Python script for masking sensitive data in log records, which you can use with NXLog Agent’s xm_python extension module. It consists of the following functions:
- convert_host()
-
Hashes the value of the
$Hostname
field if it’s available. - ipv4_encoding()
-
Searches the
$Message
field for IPv4 patterns and hashes any matching values. - pass_lib_encoding()
-
Searches the
$Message
field for a matching string and processes the field further for a matching pattern if found. The current implementation searches for thecc
string, and if found, it continues to search for and hash MasterCard debit or credit card numbers. - hash_lib_encoding()
-
An alternative implementation of
pass_lib_encoding()
using the native hashlib Python module.
import nxlog
import re
import base64
import hashlib
import uuid
from passlib.hash import pbkdf2_sha256
def regex_convert(content, controller):
if controller == "cc":
result = re.findall(r"(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}", content)
elif controller == "ip":
result = re.findall(r"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)", content)
elif controller == "email":
result = re.findall(r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)", content)
# In case the above does not work, the regex below is 99.99% guaranteed
# to find an email address. This is explained here: https://emailregex.com/
# result = re.search("(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])",email)
return result
def convert_host(event):
module = event.module
if 'Hostname' in event.field_names():
host = event.get_field('Hostname')
event.set_field('Hostname', pbkdf2_sha256.hash(host))
def ipv4_encoding(event):
module = event.module
if 'Message' in event.field_names():
message = event.get_field('Message')
check_ip = regex_convert(message, "ip")
if bool(check_ip) == True:
for ip_value in check_ip:
hashed_field = pbkdf2_sha256.hash(ip_value)
message = message.replace(ip_value, hashed_field)
event.set_field('Message', message)
def pass_lib_encoding(event):
module = event.module
if 'Message' in event.field_names():
message = event.get_field('Message')
if 'cc' in message:
check_result = regex_convert(message, "cc")
if bool(check_result) == True:
for cc_value in check_result:
hashed_field = pbkdf2_sha256.hash(cc_value)
message = message.replace(cc_value, hashed_field)
event.set_field('Message', message)
# Alternative method of hashing data in messages captured by NXLog Agent
# through a native library
def hash_lib_encoding(event):
module = event.module
if 'Message' in event.field_names():
message = event.get_field('Message')
if 'cc' in message:
check_result = regex_convert(message, "cc")
if bool(check_result) == True:
for cc_value in check_result:
salt = uuid.uuid4().hex
encoded_result = cc_value.encode('ascii')
base64_bytes = base64.b64encode(encoded_result)
base64_message = base64_bytes.decode('ascii')
sha256_message = hashlib.sha256(base64_bytes).hexdigest()
salted_sha256 = hashlib.sha256((salt + base64_message).encode('ascii')).hexdigest()
message = message.replace(cc_value, salted_sha256)
event.set_field('Message', message)
You can also find this script in our public git repository.
This script is provided "AS IS" without warranty of any kind, either expressed or implied. Use at your own risk. |
Mask hostnames
The above script can mask the $Hostname
field.
If your log records use a different field for the hostname, change the field name in the script’s convert_host()
function.
This configuration reads syslog messages from a file and parses them into structured data using the parse_syslog() procedure of the xm_syslog module.
This procedure populates the $Hostname
field, required by the convert_host()
function, with the hostname parsed from the syslog message.
It then uses the python_call() procedure of the xm_python module to mask the hostname with the convert_host()
function provided by the script.
Finally, it converts the record to JSON format using the xm_json module.
<Extension json>
Module xm_json
</Extension>
<Extension syslog>
Module xm_syslog
</Extension>
<Extension python>
Module xm_python
PythonCode '/path/to/hasher.py'
</Extension>
<Input messages>
Module im_file
File '/var/log/syslog'
<Exec>
parse_syslog();
python_call('convert_host');
to_json();
</Exec>
</Input>
The following is a syslog message collected from a Linux host.
Oct 14 16:17:48 NXLog-Server-1 systemd[1]: Started NXLog daemon.
The following JSON object shows the same log record after NXLog Agent processed it.
{
"EventReceivedTime": "2023-10-14T16:18:54.822255+02:00",
"SourceModuleName": "messages",
"SourceModuleType": "im_file",
"SyslogFacilityValue": 1,
"SyslogFacility": "USER",
"SyslogSeverityValue": 5,
"SyslogSeverity": "NOTICE",
"SeverityValue": 2,
"Severity": "INFO",
"Hostname": "$pbkdf2-sha256$29000$xNibc07JeU.JUSplTAkB4A$zs7SqFzo1gKu.gm8cDckjq2EM9Nn9QSdolsSMPi/B8c",
"EventTime": "2023-10-14T16:17:48.000000+02:00",
"SourceName": "systemd",
"ProcessID": 1,
"Message": "Started NXLog daemon."
}
Mask IP addresses
The above script can mask IP addresses in the $Message
field.
Add or change field names in the script’s ipv4_encoding()
function to process a different field.
This configuration reads log records in JSON format from a file and parses them into structured data using the parse_json() procedure of the xm_json module.
This procedure populates the $Message
field required by the ipv4_encoding()
function.
It then uses the python_call() procedure of the xm_python module to mask any IP addresses in the message with the ipv4_encoding()
function provided by the script.
Finally, it converts the record back to JSON format.
<Extension json>
Module xm_json
</Extension>
<Extension python>
Module xm_python
PythonCode '/path/to/hasher.py'
</Extension>
<Input security_log>
Module im_file
File '/path/to/security/log'
<Exec>
parse_json();
python_call('ipv4_encoding');
to_json();
</Exec>
</Input>
The following is a log record in JSON format, containing a $Message
field with two instances of IPv4 addresses.
{
"EventTime": "Sat Oct 14 16:50:30 2023",
"EventType": "Security Warning",
"Message": "Error: Protocol error (-21), Protocol switch to TCP rejected, close connection (HTTP status code 403, Forbidden) [http_plg.c 5678] local host: 192.168.1.120:12345 () remote host: 192.168.1.122:62722 ()"
}
The following JSON object shows the same log record after NXLog Agent processed it.
{
"EventReceivedTime": "2023-10-14T16:51:26.549994+02:00",
"SourceModuleName": "input_file",
"SourceModuleType": "im_file",
"EventTime": "Sat Oct 14 16:50:30 2023",
"EventType": "Security Warning",
"Message": "Error: Protocol error (-21), Protocol switch to TCP rejected, close connection (HTTP status code 403, Forbidden) [http_plg.c 5678] local host: $pbkdf2-sha256$29000$9t4bg5DSupdyLqW09p7TOg$8vMWXRO5vT9HrwoHnYZgVUMc/Dgk8IPxkclcZtF25YY:12345 () remote host: $pbkdf2-sha256$29000$FQJg7L3XGoPQmhMipDSG8A$/lJ1/iSMVwHIsEnbLBxV7HE7EGvuHDvnhQpvLZ8F4nA:62722 ()"
}
Mask credit card numbers
The above script can mask MasterCard debit and credit card numbers in $Message
fields containing the text cc
.
Change the script’s pass_lib_encoding()
function to customize the field name or identifying text.
You can change or add credit card providers by modifying the script’s regex_convert()
function.
This configuration reads SQL audit log records from a file and populates the $Message
field required by the pass_lib_encoding()
function.
It then uses the python_call() procedure of the xm_python module to mask any credit card numbers in the message with the pass_lib_encoding()
function provided by the script.
Finally, it updates the value of the $raw_event
field with the updated message.
<Extension python>
Module xm_python
PythonCode '/path/to/hasher.py'
</Extension>
<Input audit_log>
Module im_file
File '/path/to/audit/log'
<Exec>
$Message = $raw_event;
python_call('pass_lib_encoding');
$raw_event = $Message;
</Exec>
</Input>
The following is a SQL audit log record containing credit card details.
2023-10-14 17:54:32|TERMINAL1|User1|cc_details|INSERT INTO cc_details VALUES(1234, 5577000055770004, 2026, 326, 2023, 4, 'John', 'Doe', '123 8th Avenue', 'New York', 'NY', '10019', 'US');
The following is the same log record after NXLog Agent processed it.
2023-10-14 17:54:32|TERMINAL1|User1|cc_details|INSERT INTO cc_details VALUES(1234, $pbkdf2-sha256$29000$ZIzRGuOcs3aOUapVSgmh9A$g./AOB07kTCEzYwBlcs822APRXr5swHEztZhaAcGrFA, 2026, 326, 2023, 4, 'John', 'Doe', '123 8th Avenue', 'New York', 'NY', '10019', 'US');