Amazon S3
This add-on has been replaced by im_amazons3 and om_amazons3 as of NXLog Enterprise Edition version 5.7 and is no longer supported. |
NXLog can both receive and send logs to Amazon S3 cloud storage. The NXLog Python modules for input and output (im_python and om_python) are used for this, as well as Boto3, the AWS SDK for Python. For more information about Boto3, see AWS SDK for Python (Boto3) on Amazon AWS.
Setting up Boto3
-
Boto3 can be installed with pip or the system package manager.
-
pip:
pip install boto3
-
APT on a Debian-based distribution:
apt-get install python-boto3
-
Yum on a Red Hat-based distribution:
yum install python2-boto3
The python2-boto3
package requires the installation of the EPEL repository.
-
-
Make sure an AWS service account has been created.
-
Set the default region and credentials in
~/.aws/
. This can be done interactively, if the AWS CLI is installed. Or, edit the files shown below. Credentials for the AWS account can be found in the IAM Console. A new user can be created, or an existing user can be used. Go to "manage access keys" and generate a new set of keys. More information about the initial setup and the credentials can be found in the Boto3 Quickstart and Credentials documents.~/.aws/config[default] region=eu-central-1
~/.aws/credentials[default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY
The region and credential configuration can also be hardcoded in the scripts, but this is not recommended.
AWS S3 buckets, objects, keys, and structure
Amazon S3 stores objects inside containers called buckets. There is a finite number of buckets available to the user and an infinite number of objects can be stored. More general information about Amazon S3 can be found at Getting Started with Amazon Simple Storage Service on Amazon AWS.
Both the input and output Python scripts interact with a single bucket on Amazon S3. The scripts will not create, delete, or alter the bucket or any of its properties, permissions, or management options. It is the responsibility of the user to create the bucket, provide the appropriate permissions (ACL), and further configure any lifecycle, replication, encryption, or other options. Similarly, the scripts do not alter the storage class of the objects stored or any other properties or permissions.
We selected a schema where we store events on a single bucket and each object
has a key that references the server (or service) name, the date, and the
event received time. Though Amazon S3 uses a flat structure to store objects,
objects with similar key prefixes are grouped together resembling the
structure of a file system. The following is a visual representation of the
naming scheme used. Note that the key name in the deepest level represents a
time—however, Amazon S3 uses the colon (:
) as a special character and to
avoid escaping we selected the dot (.
) character to substitute it.
-
MYBUCKET/
-
SERVER01/
-
2018-05-17/
-
12.36.34.1
-
12.36.35.1
-
-
2018-05-18/
-
10.46.34.1
-
10.46.35.1
-
10.46.35.2
-
10.46.36.1
-
-
-
SERVER02/
-
2018-05-16/
-
14.23.12.1
-
-
2018-05-17/
-
17.03.52.1
-
17.03.52.2
-
17.03.52.3
-
-
-
Sending logs to Amazon S3
Logs can be sent to Amazon S3 cloud object storage as follows.
Log events are stored in the Amazon S3 bucket with object key names comprised
from the server name, date in YYYY-MM-DD
format, time in HH.MM.SS
format,
and a counter (since multiple events can be received during the same second).
-
Copy the
s3_write.py
script to a location that is accessible by NXLog. -
Edit the
BUCKET
andSERVER
variables in the code. -
Configure NXLog with an om_python instance.
This configuration reads raw events from a file with im_file and uses om_python to forward them, without any additional processing, to the configured S3 storage.
<Input file>
Module im_file
File "input.log"
# These may be helpful for testing
SavePos FALSE
ReadFromLast FALSE
</Input>
<Output s3>
Module om_python
PythonCode s3_write.py
</Output>
<Route file_to_s3>
Path file => s3
</Route>
Retrieving logs from Amazon S3
Logs can be retrieved from Amazon S3 cloud object storage as follows.
The script keeps track of the last object retrieved from Amazon S3 by means of
a file called lastkey.log
, which is stored locally. Even in the event of an
abnormal termination, the script will continue from where it stopped. The
lastkey.log
file can be deleted to reset that behavior (or edited if
necessary).
-
Copy the
s3_read.py
script to a location that is accessible by NXLog. -
Edit the
BUCKET
,SERVER
, andPOLL_INTERVAL
variables in the code. ThePOLL_INTERVAL
is the time the script will wait before checking again for new events. TheMAXKEYS
variable should be fine in all cases with the default value of 1000 keys. -
Configure NXLog with an im_python instance.
This configuration collects events from the configured S3 storage with im_python and writes the raw events to file with om_file (without performing any additional processing).
<Input s3>
Module im_python
PythonCode s3_read.py
</Input>
<Output file>
Module om_file
File "output.log"
</Output>
<Route s3_to_file>
Path s3 => file
</Route>
Serialization and compression
In the previous examples, only the $raw_event
field was stored in the
objects. An easy way to store more than one field is to "pickle" (or
"serialize" or "marshal") all the fields of an event.
import pickle
all = {}
for field in event.get_names():
all.update({field: event.get_field(field)})
newraw = pickle.dumps(all)
client.put_object(Body=newraw, Bucket=BUCKET, Key=key)
Compressing the events with gzip is also possible.
import StringIO
import gzip
out = StringIO.StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
f.write(newraw)
gzallraw = out.getvalue()
client.put_object(Body=gzallraw, Bucket=BUCKET, Key=key)