Amazon S3 (im_amazons3)
Amazon Simple Storage Service (S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.
This module can be used to collect logs from Amazon S3 and compatible services.
Amazon S3 buckets, objects, keys, and structure
Amazon S3 stores objects inside containers called buckets. A finite number of buckets that can store an infinite number of objects are available to the user. See Getting Started with Amazon S3 in the Amazon S3 User Guide for more information.
Both the input and output modules interact with a single bucket on Amazon S3. The module will not create, delete, or alter the bucket or any of its properties, permissions, or management options. Instead, you must create the bucket, provide the appropriate permissions (ACL), and further configure any lifecycle, replication, encryption, or other options. Similarly, the module does not alter the storage class of the objects stored or any other properties or permissions.
We selected a schema where we store events in a single bucket.
Each object has a key that references the server or service name, the date, and the time NXLog received the event.
Although Amazon S3 uses a flat structure to store objects, it groups objects with similar key prefixes resembling a filesystem structure.
The following is a visual representation of our naming scheme.
Note that the key name at the fourth level represents the time in UTC.
However, Amazon S3 uses the colon (:
) as a special character; therefore, we replace it with the dot (.
) character to simplify matters.
-
MYBUCKET/
-
SERVER01/
-
2018-05-17/
-
12.36.34.1
-
12.36.35.1
-
-
2018-05-18/
-
10.46.34.1
-
10.46.35.1
-
10.46.35.2
-
10.46.36.1
-
-
-
SERVER02/
-
2018-05-16/
-
14.23.12.1
-
-
2018-05-17/
-
17.03.52.1
-
17.03.52.2
-
17.03.52.3
-
-
-
Configuration
The im_amazons3 module accepts the following directives in addition to the common module directives. The AccessKey, Bucket, Region, SecretKey, and Server directives are required.
- AccessKey
-
This mandatory directive specifies the AWS public access key ID.
- Bucket
-
This mandatory directive specifies the Amazon S3 bucket name.
- Region
-
This mandatory directive specifies the service region code. It accepts any value when used in conjunction with the URL directive. Otherwise, the following codes are supported:
Provider Region Code Amazon
US East (N. Virginia)
us-east-1
Amazon
US East (Ohio)
us-east-2
Amazon
US West (N. California)
us-west-1
Amazon
US West (Oregon)
us-west-2
Amazon
Canada (Central)
ca-central-1
Amazon
Africa (Cape Town)
af-south-1
Amazon
Asia Pacific (Hong Kong)
ap-east-1
Amazon
Asia Pacific (Mumbai)
ap-south-1
Amazon
Asia Pacific (Tokyo)
ap-northeast-1
Amazon
Asia Pacific (Seoul)
ap-northeast-2
Amazon
Asia Pacific (Osaka)
ap-northeast-3
Amazon
Asia Pacific (Singapore)
ap-southeast-1
Amazon
Asia Pacific (Sydney)
ap-southeast-2
Amazon
China (Beijing)
cn-north-1
Amazon
China (Ningxia)
cn-northwest-1
Amazon
Europe (Stockholm)
eu-north-1
Amazon
Europe (Frankfurt)
eu-central-1
Amazon
Europe (Ireland)
eu-west-1
Amazon
Europe (London)
eu-west-2
Amazon
Europe (Paris)
eu-west-3
Amazon
South America (São Paulo)
sa-east-1
Amazon
Middle East (Bahrain)
me-south-1
Digital Ocean
US East (New York City)
nyc3
Digital Ocean
Europe (Amsterdam)
ams3
Digital Ocean
Asia Pacific (Singapore)
sgp1
Digital Ocean
Europe (Frankfurt)
fra1
Yandex (Object Storage)
Russia
ru-central1
Wasabi
US East (N. Virginia)
wa-us-east-1
Wasabi
US East (N. Virginia)
wa-us-east-2
Wasabi
US West (Oregon)
wa-us-west-1
Wasabi
Europe (Amsterdam)
wa-eu-central-1
- SecretKey
-
This mandatory directive specifies the AWS secret access key.
- Server
-
This mandatory directive sets the object path prefix. The module will read object names starting with the specified value only. Use
/
to read all objects. See Amazon S3 buckets, objects, keys, and structure.
- InputType
-
See the InputType directive in the list of common module directives. If this directive is not specified the default is LineBased (the module will use CRLF as the record terminator on Windows or LF on Unix).
This directive also supports data converters, see the description in the InputType section.
- PathStyle
-
This boolean directive changes how the module constructs the URL to cater to providers like MinIO, which accept the bucket name in the path instead of a subdomain. The default is
TRUE
if the URL directive is specified. Otherwise, this directive is not used. IfFALSE
, the module will prefix the URL with the Bucket as a subdomain. So, for example, if the URL ishttps://s3-us-east-2.amazonaws.com
and the bucket ismybucket
, the resulting URL will behttps://mybucket.s3-us-east-2.amazonaws.com
.
- PollInterval
-
This optional directive specifies how frequently the module will check for new events in seconds. If this directive is not specified, it defaults to 60 seconds.
- Reconnect
-
This optional directive sets the reconnect interval in seconds. If it is set, the module attempts to reconnect in every defined second. If it is not set, the reconnect interval will start at 1 second and doubles on every attempt. If the duration of the successful connection is greater than the current reconnect interval, then the reconnect interval will be reset to 1 sec.
- ReadFromLast
-
This optional boolean directive instructs the module to only read logs that arrive after NXLog is started. This directive comes into effect if a saved position is not found, for example on first start, or when the SavePos directive is
FALSE
. When the SavePos directive isTRUE
and a previously saved position is found, the module will always resume reading from the saved position. If ReadFromLast isFALSE
, the module will read all the available logs. This can result in a lot of messages and is usually not the expected behavior. If this directive is not specified, it defaults toTRUE
.The following matrix shows the outcome of this directive in conjunction with the SavePos directive:
ReadFromLast SavePos Saved Position Outcome TRUE
TRUE
No
Reads events that are logged after NXLog is started.
TRUE
TRUE
Yes
Reads events from saved position.
TRUE
FALSE
No
Reads events that are logged after NXLog is started.
TRUE
FALSE
Yes
Reads events that are logged after NXLog is started.
FALSE
TRUE
No
Reads all events.
FALSE
TRUE
Yes
Reads events from saved position.
FALSE
FALSE
No
Reads all events.
FALSE
FALSE
Yes
Reads all events.
- SavePos
-
If this boolean directive is set to
TRUE
, the timestamp of the last read event will be saved when NXLog exits. The timestamp will be read from the cache file upon startup. The default isTRUE
, the last timestamp will be saved if this directive is not specified. This directive affects the outcome of the ReadFromLast directive. The SavePos directive can be overridden by the global NoCache directive.
- StartFrom
-
This optional directive specifies the time in RFC 3339 format of the first event to pull, e.g.,
2022-11-19T16:39:57-08:00
. If this directive is not specified, the module reads events according to the ReadFromLast directive.
- URL
-
Specify the URL for a custom endpoint. If the protocol is not specified, the module will use HTTPS.
Examples
This configuration uses the im_amazons3 input module to collect logs from an Amazon S3 Bucket named MYBUCKET
.
The Server directive specifies that only object names starting with SERVER01
should be collected.
<Input amazon_s3>
Module im_amazons3
Region us-east-1
Bucket MYBUCKET
Server SERVER01
StartFrom 2022-09-15T10:23:00-06:00 (1)
AccessKey <YOUR_ACCESS_KEY> (2)
SecretKey <YOUR_SECRET_KEY> (3)
</Input>
This configuration uses the im_amazons3 input module to collect logs from a self-hosted MinIO S3 instance.
<Input amazon_s3>
Module im_amazons3
URL https://example.net (1)
Region myminio
Bucket MYBUCKET
Server SERVER01
AccessKey <YOUR_ACCESS_KEY> (2)
SecretKey <YOUR_SECRET_KEY> (3)
</Input>