Amazon S3

This add-on has been replaced by im_amazons3 and om_amazons3 as of NXLog Enterprise Edition version 5.7 and is no longer supported.

NXLog can both receive and send logs to Amazon S3 cloud storage. The NXLog Python modules for input and output (im_python and om_python) are used for this, as well as Boto3, the AWS SDK for Python. For more information about Boto3, see AWS SDK for Python (Boto3) on Amazon AWS.

Setting up Boto3

Boto3 can be installed with pip or the system package manager.
- pip: pip install boto3
- APT on a Debian-based distribution: apt-get install python-boto3
- Yum on a Red Hat-based distribution: yum install python2-boto3
  
  The python2-boto3 package requires the installation of the EPEL repository.
Make sure an AWS service account has been created.
Set the default region and credentials in ~/.aws/. This can be done interactively, if the AWS CLI is installed. Or, edit the files shown below. Credentials for the AWS account can be found in the IAM Console. A new user can be created, or an existing user can be used. Go to "manage access keys" and generate a new set of keys. More information about the initial setup and the credentials can be found in the Boto3 Quickstart and Credentials documents.
~/.aws/config
```
[default]
region=eu-central-1
```
~/.aws/credentials
```
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
```
The region and credential configuration can also be hardcoded in the scripts, but this is not recommended.

AWS S3 buckets, objects, keys, and structure

Amazon S3 stores objects inside containers called buckets. There is a finite number of buckets available to the user and an infinite number of objects can be stored. More general information about Amazon S3 can be found at Getting Started with Amazon Simple Storage Service on Amazon AWS.

Both the input and output Python scripts interact with a single bucket on Amazon S3. The scripts will not create, delete, or alter the bucket or any of its properties, permissions, or management options. It is the responsibility of the user to create the bucket, provide the appropriate permissions (ACL), and further configure any lifecycle, replication, encryption, or other options. Similarly, the scripts do not alter the storage class of the objects stored or any other properties or permissions.

We selected a schema where we store events on a single bucket and each object has a key that references the server (or service) name, the date, and the event received time. Though Amazon S3 uses a flat structure to store objects, objects with similar key prefixes are grouped together resembling the structure of a file system. The following is a visual representation of the naming scheme used. Note that the key name in the deepest level represents a time—however, Amazon S3 uses the colon (:) as a special character and to avoid escaping we selected the dot (.) character to substitute it.

MYBUCKET/
- SERVER01/
  - 2018-05-17/
    
    12.36.34.1
    
    12.36.35.1
  - 2018-05-18/
    
    10.46.34.1
    
    10.46.35.1
    
    10.46.35.2
    
    10.46.36.1
- SERVER02/
  - 2018-05-16/
    
    14.23.12.1
  - 2018-05-17/
    
    17.03.52.1
    
    17.03.52.2
    
    17.03.52.3

Sending logs to Amazon S3

Logs can be sent to Amazon S3 cloud object storage as follows.

Log events are stored in the Amazon S3 bucket with object key names comprised from the server name, date in YYYY-MM-DD format, time in HH.MM.SS format, and a counter (since multiple events can be received during the same second).

Copy the s3_write.py script to a location that is accessible by NXLog.
Edit the BUCKET and SERVER variables in the code.
Configure NXLog with an om_python instance.

Example 1. Sending events from file to S3

This configuration reads raw events from a file with im_file and uses om_python to forward them, without any additional processing, to the configured S3 storage.

nxlog.conf

<Input file>
    Module          im_file
    File            "input.log"
    # These may be helpful for testing
    SavePos         FALSE
    ReadFromLast    FALSE
</Input>

<Output s3>
    Module          om_python
    PythonCode      s3_write.py
</Output>

<Route file_to_s3>
    Path            file => s3
</Route>

Retrieving logs from Amazon S3

Logs can be retrieved from Amazon S3 cloud object storage as follows.

The script keeps track of the last object retrieved from Amazon S3 by means of a file called lastkey.log, which is stored locally. Even in the event of an abnormal termination, the script will continue from where it stopped. The lastkey.log file can be deleted to reset that behavior (or edited if necessary).

Copy the s3_read.py script to a location that is accessible by NXLog.
Edit the BUCKET, SERVER, and POLL_INTERVAL variables in the code. The POLL_INTERVAL is the time the script will wait before checking again for new events. The MAXKEYS variable should be fine in all cases with the default value of 1000 keys.
Configure NXLog with an im_python instance.

Example 2. Reading S3 logs and saving to a file

This configuration collects events from the configured S3 storage with im_python and writes the raw events to file with om_file (without performing any additional processing).

nxlog.conf

<Input s3>
    Module      im_python
    PythonCode  s3_read.py
</Input>

<Output file>
    Module      om_file
    File        "output.log"
</Output>

<Route s3_to_file>
    Path        s3 => file
</Route>

Serialization and compression

In the previous examples, only the $raw_event field was stored in the objects. An easy way to store more than one field is to "pickle" (or "serialize" or "marshal") all the fields of an event.

Pickling events

import pickle

all = {}
for field in event.get_names():
     all.update({field: event.get_field(field)})

newraw = pickle.dumps(all)

client.put_object(Body=newraw, Bucket=BUCKET, Key=key)

Compressing the events with gzip is also possible.

Compressing events with gzip

import StringIO
import gzip

out = StringIO.StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
     f.write(newraw)

gzallraw = out.getvalue()

client.put_object(Body=gzallraw, Bucket=BUCKET, Key=key)