Amazon S3 (im_amazons3)

Amazon Simple Storage Service (S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.

This module can be used to collect logs from Amazon S3 and compatible services.

Amazon S3 buckets, objects, keys, and structure

Amazon S3 stores objects inside containers called buckets. A finite number of buckets that can store an infinite number of objects are available to the user. See Getting Started with Amazon S3 in the Amazon S3 User Guide for more information.

Both the input and output modules interact with a single bucket on Amazon S3. The module will not create, delete, or alter the bucket or any of its properties, permissions, or management options. Instead, you must create the bucket, provide the appropriate permissions (ACL), and further configure any lifecycle, replication, encryption, or other options. Similarly, the module does not alter the storage class of the objects stored or any other properties or permissions.

We selected a schema where we store events in a single bucket. Each object has a key that references the server or service name, the date, and the time NXLog Agent received the event. Although Amazon S3 uses a flat structure to store objects, it groups objects with similar key prefixes resembling a filesystem structure. The following is a visual representation of our naming scheme. Note that the key name at the fourth level represents the time in UTC. However, Amazon S3 uses the colon (:) as a special character; therefore, we replace it with the dot (.) character to simplify matters.

  • MYBUCKET/

    • SERVER01/

      • 2018-05-17/

        • 12.36.34.1

        • 12.36.35.1

      • 2018-05-18/

        • 10.46.34.1

        • 10.46.35.1

        • 10.46.35.2

        • 10.46.36.1

    • SERVER02/

      • 2018-05-16/

        • 14.23.12.1

      • 2018-05-17/

        • 17.03.52.1

        • 17.03.52.2

        • 17.03.52.3

Configuration

The im_amazons3 module accepts the following directives in addition to the common module directives. The Bucket, Region, and Server directives are required.

The AccessKey and SecretKey directives are required if NXLog Agent is not running in the same tenant as the S3 bucket.

Required directives

The following directives are required for the module to start.

Bucket

This mandatory directive specifies the Amazon S3 bucket name.

PathStyle

This boolean directive changes how the module constructs the URL to cater to providers like MinIO, which accepts the bucket name in the path instead of a subdomain. The default is TRUE if the URL directive is specified. Otherwise, this directive is not used. If FALSE, the module will prefix the URL with the Bucket as a subdomain. So, for example, if the URL is https://s3-us-east-2.amazonaws.com and the bucket is mybucket, the resulting URL will be https://mybucket.s3-us-east-2.amazonaws.com.

Region

This mandatory directive specifies the service region code. It accepts any value when used in conjunction with the URL directive. Otherwise, the following codes are supported:

Provider Region Code

Amazon

US East (N. Virginia)

us-east-1

Amazon

US East (Ohio)

us-east-2

Amazon

US West (N. California)

us-west-1

Amazon

US West (Oregon)

us-west-2

Amazon

Canada (Central)

ca-central-1

Amazon

Africa (Cape Town)

af-south-1

Amazon

Asia Pacific (Hong Kong)

ap-east-1

Amazon

Asia Pacific (Mumbai)

ap-south-1

Amazon

Asia Pacific (Tokyo)

ap-northeast-1

Amazon

Asia Pacific (Seoul)

ap-northeast-2

Amazon

Asia Pacific (Osaka)

ap-northeast-3

Amazon

Asia Pacific (Singapore)

ap-southeast-1

Amazon

Asia Pacific (Sydney)

ap-southeast-2

Amazon

China (Beijing)

cn-north-1

Amazon

China (Ningxia)

cn-northwest-1

Amazon

Europe (Stockholm)

eu-north-1

Amazon

Europe (Frankfurt)

eu-central-1

Amazon

Europe (Ireland)

eu-west-1

Amazon

Europe (London)

eu-west-2

Amazon

Europe (Paris)

eu-west-3

Amazon

South America (São Paulo)

sa-east-1

Amazon

Middle East (Bahrain)

me-south-1

Digital Ocean

US East (New York City)

nyc3

Digital Ocean

Europe (Amsterdam)

ams3

Digital Ocean

Asia Pacific (Singapore)

sgp1

Digital Ocean

Europe (Frankfurt)

fra1

Yandex (Object Storage)

Russia

ru-central1

Wasabi

US East (N. Virginia)

wa-us-east-1

Wasabi

US East (N. Virginia)

wa-us-east-2

Wasabi

US West (Oregon)

wa-us-west-1

Wasabi

Europe (Amsterdam)

wa-eu-central-1

Server

This mandatory directive sets the object path prefix. The module will read object names starting with the specified value only. Use / to read all objects. See Amazon S3 buckets, objects, keys, and structure.

URL

Specify the URL for a custom endpoint. If the protocol is not specified, the module will use HTTPS.

HTTP(S) directives

The following directives are for configuring HTTP(S) connection settings.

AddHeader

This optional directive can be specified multiple times to add custom headers to each HTTP request.

Compression

This optional directive can be used to enable HTTP compression for outgoing HTTP messages. The possible values are none, gzip, and deflate. By default, compression is disabled. Please note that some HTTP servers may not accept compressed HTTP requests. If a server doesn’t support a specific compression method, it may return 415 Unsupported Media Type errors in response to compressed requests.

HTTPBasicAuthPassword

HTTP basic authorization password.

HTTP authorization works only when both HTTPBasicAuthUser and HTTPBasicAuthPassword parameters are set.

HTTPBasicAuthUser

HTTP basic authorization username.

HTTP authorization works only when both HTTPBasicAuthUser and HTTPBasicAuthPassword parameters are set.

HTTPSAllowExpired

Specifies if the connection should be allowed with an expired certificate. If set to TRUE, the remote host will be able to connect with an expired certificate. The default is FALSE: the certificate must not be expired.

HTTPSAllowUntrusted

Specifies if the connection should be allowed without certificate verification. If set to TRUE, the connection will be allowed even if the remote host presents an unknown or self-signed certificate. The default value is FALSE: the remote host must present a trusted certificate.

HTTPSCADir

The path to a directory containing certificate authority (CA) certificates. These certificates will be used to verify the certificate presented by the remote host. The certificate files must be named using the OpenSSL hashed format, i.e. the hash of the certificate followed by .0, .1 etc. To find the hash of a certificate using OpenSSL:

$ openssl x509 -hash -noout -in ca.crt

For example, if the certificate hash is e2f14e4a, then the certificate filename should be e2f14e4a.0. If there is another certificate with the same hash then it should be named e2f14e4a.1 and so on.

A remote host’s self-signed certificate (which is not signed by a CA) can also be trusted by including a copy of the certificate in this directory.

The default operating system root certificate store will be used if this directive is not specified. Unix-like operating systems commonly store root certificates in /etc/ssl/certs. Windows operating systems use the Windows Certificate Store, while macOS uses the Keychain Access Application as the default certificate store. See Certification Authority (CA) certificates in the NXLog Platform User Guide for more information on using this directive.

In addition, Microsoft’s PKI repository contains root certificates for Microsoft services.

HTTPSCAFile

The path of the certificate authority (CA) certificate that will be used to verify the certificate presented by the remote host. A remote host’s self-signed certificate (which is not signed by a CA) can be trusted by specifying the remote host certificate itself. In case of certificates signed by an intermediate CA, the certificate specified must contain the complete certificate chain (certificate bundle).

HTTPSCertFile

The path of the certificate file that will be presented to the remote host during the HTTPS handshake.

HTTPSCertKeyFile

The path of the private key file that was used to generate the certificate specified by the HTTPSCertFile directive. This is used for the HTTPS handshake.

Proxy

This optional directive is used to specify the protocol, IP address (or hostname) and port number of the HTTP or SOCKS proxy host to be used. The format is protocol://hostname:port.

Reconnect

This optional directive sets the reconnect interval in seconds. If it is set, the module attempts to reconnect in every defined second. If it is not set, the reconnect interval will start at 1 second and double with every attempt. If the duration of the successful connection is greater than the current reconnect interval, then the reconnect interval will be reset to 1 sec.

The Reconnect directive must be used with caution. If it is used on multiple systems, it can send reconnect requests simultaneously to the same destination, potentially overloading the destination system. It may also cause NXLog Agent to use unusually high system resources or cause NXLog Agent to become unresponsive.

ReconnectOnData

This optional directive defines the behavior when the connection with the remote host is lost. When set to TRUE, the module only attempts to reconnect when it has data to send. The default value is FALSE; it will always keep a connection open with the remote host.

Optional directives

AccessKey

This optional directive specifies the AWS public access key ID. If AccessKey and SecretKey are missing, the module will try to read the credentials from the environment, STS, profile, or instance metadata. If none are available, the module will try to log in anonymously.

InputType

See the InputType directive in the list of common module directives. If this directive is not specified the default is LineBased (the module will use CRLF as the record terminator on Windows or LF on Unix).

This directive also supports data converters, see the description in the InputType section.

PollInterval

This optional directive specifies how frequently the module will check for new events in seconds. If this directive is not specified, it defaults to 60 seconds.

ReadFromLast

This optional boolean directive instructs the module to only read logs that arrive after NXLog Agent is started. This directive comes into effect if a saved position is not found, for example on the first start, or when the SavePos directive is FALSE. When the SavePos directive is TRUE and a previously saved position is found, the module will always resume reading from the saved position. If ReadFromLast is FALSE, the module will read all the available logs. This can result in a lot of messages and is usually not the expected behavior. If this directive is not specified, it defaults to TRUE.

The following matrix shows the outcome of this directive in conjunction with the SavePos directive:

ReadFromLast SavePos SavedPosition Outcome

TRUE

TRUE

No

Reads events that are logged after NXLog Agent is started.

TRUE

TRUE

Yes

Reads events from the saved position.

TRUE

FALSE

No

Reads events that are logged after NXLog Agent is started.

TRUE

FALSE

Yes

Reads events that are logged after NXLog Agent is started.

FALSE

TRUE

No

Reads all events.

FALSE

TRUE

Yes

Reads events from the saved position.

FALSE

FALSE

No

Reads all events.

FALSE

FALSE

Yes

Reads all events.

SavePos

If this boolean directive is set to TRUE, the timestamp of the last read event will be saved when NXLog Agent exits. The timestamp will be read from the cache file upon startup. The default is TRUE, the last timestamp will be saved if this directive is not specified. This directive affects the outcome of the ReadFromLast directive. The SavePos directive can be overridden by the global NoCache directive.

SecretKey

This optional directive specifies the AWS secret access key.

StartFrom

This optional directive specifies the time in RFC 3339 format of the first event to pull, e.g., 2022-11-19T16:39:57-08:00. If this directive is not specified, the module reads events according to the ReadFromLast directive.

Fields

The following fields are used by im_amazons3.

$raw_event (type: string)

A record from the stored object

Examples

Example 1. Collecting logs from an Amazon S3 bucket

This configuration uses the im_amazons3 input module to collect logs from an Amazon S3 Bucket named MYBUCKET. The Server directive specifies that only object names starting with SERVER01 should be collected.

<Input amazon_s3>
    Module       im_amazons3
    Region       us-east-1
    Bucket       MYBUCKET
    Server       SERVER01
    StartFrom    2022-09-15T10:23:00-06:00 (1)

    AccessKey    <YOUR_ACCESS_KEY> (2)
    SecretKey    <YOUR_SECRET_KEY> (3)
</Input>
1 StartFrom is an optional directive to specify the oldest event to pull.
2 The AccessKey directive specifies the AWS public access key ID.
3 The SecretKey directive specifies the AWS secret access key.
Example 2. Collecting logs from other Amazon S3-compatible services

This configuration uses the im_amazons3 input module to collect logs from a self-hosted MinIO S3 instance.

<Input amazon_s3>
    Module       im_amazons3
    URL          https://example.net (1)
    Region       myminio
    Bucket       MYBUCKET
    Server       SERVER01

    AccessKey    <YOUR_ACCESS_KEY> (2)
    SecretKey    <YOUR_SECRET_KEY> (3)
</Input>
1 The URL directive is specified to use a custom endpoint.
2 The AccessKey directive specifies the AWS public access key ID.
3 The SecretKey directive specifies the AWS secret access key.