Google Cloud Pub/Sub (im_googlepubsub)

Google Cloud Pub/Sub is a service that allows event producers to deliver events to subscribers asynchronously. It is commonly used for data streaming, real-time event distribution, or as a message queue for parallel workflows.

This module uses the Google Pub/Sub REST API to create a subscription and collect logs from a Google Pub/Sub topic.

To examine the supported platforms, see the list of installation packages.

Configuring a Google Cloud service account

im_googlepubsub requires a Google Cloud service account and a corresponding private key file in JSON format to connect to the Google Cloud Pub/Sub API. Follow these instructions to create a new service account and download its private key file for an existing project.

  1. Log in to your Google Cloud account and switch to the project you want to configure.

  2. From the navigation menu, click IAM & Admin > Service Accounts.

    IAM & Admin menu
  3. Click CREATE SERVICE ACCOUNT.

  4. Enter a service account name and description and click CREATE AND CONTINUE.

    Create service account
  5. Select the Owner role from the Role drop-down and click DONE.

    Service account role
  6. Click on the newly created account on the Service accounts page to open its configuration page.

  7. Click the KEYS tab, expand the ADD KEY drop-down and select Create new key.

    Create new service key
  8. Select JSON for the key type and click CREATE to download the private key. Save the private key file to a location accessible by NXLog Agent. This file is required for the NXLog Agent configuration.

    New service key type

Configuration

The im_googlepubsub module accepts the following directives in addition to the common module directives. The CredentialsFile and Subscription directives are required.

Required directives

The following directives are required for the module to start.

CredentialsFile

This mandatory directive specifies the path to the private key file of the service account required for authenticating with the Google Pub/Sub. See Configuring a Google Cloud service account for more information.

Subscription

This mandatory directive specifies the name of the subscription from which messages should be pulled. The format is projects/{project}/subscriptions/{sub}.

HTTP(S) directives

The following directives are for configuring HTTP(S) connection settings.

AddHeader

This optional directive can be specified multiple times to add custom headers to each HTTP request.

Compression

This optional directive can be used to enable HTTP compression for outgoing HTTP messages. The possible values are none, gzip, and deflate. By default, compression is disabled. Please note that some HTTP servers may not accept compressed HTTP requests. If a server doesn’t support a specific compression method, it may return 415 Unsupported Media Type errors in response to compressed requests.

HTTPBasicAuthPassword

HTTP basic authorization password.

HTTP authorization works only when both HTTPBasicAuthUser and HTTPBasicAuthPassword parameters are set.

HTTPBasicAuthUser

HTTP basic authorization username.

HTTP authorization works only when both HTTPBasicAuthUser and HTTPBasicAuthPassword parameters are set.

HTTPSAllowExpired

Specifies if the connection should be allowed with an expired certificate. If set to TRUE, the remote host will be able to connect with an expired certificate. The default is FALSE: the certificate must not be expired.

HTTPSAllowUntrusted

Specifies if the connection should be allowed without certificate verification. If set to TRUE, the connection will be allowed even if the remote host presents an unknown or self-signed certificate. The default value is FALSE: the remote host must present a trusted certificate.

HTTPSCADir

The path to a directory containing certificate authority (CA) certificates. These certificates will be used to verify the certificate presented by the remote host. The certificate files must be named using the OpenSSL hashed format, i.e. the hash of the certificate followed by .0, .1 etc. To find the hash of a certificate using OpenSSL:

$ openssl x509 -hash -noout -in ca.crt

For example, if the certificate hash is e2f14e4a, then the certificate filename should be e2f14e4a.0. If there is another certificate with the same hash then it should be named e2f14e4a.1 and so on.

A remote host’s self-signed certificate (which is not signed by a CA) can also be trusted by including a copy of the certificate in this directory.

The default operating system root certificate store will be used if this directive is not specified. Unix-like operating systems commonly store root certificates in /etc/ssl/certs. Windows operating systems use the Windows Certificate Store, while macOS uses the Keychain Access Application as the default certificate store. See Certification Authority (CA) certificates in the NXLog Platform User Guide for more information on using this directive.

In addition, Microsoft’s PKI repository contains root certificates for Microsoft services.

HTTPSCAFile

The path of the certificate authority (CA) certificate that will be used to verify the certificate presented by the remote host. A remote host’s self-signed certificate (which is not signed by a CA) can be trusted by specifying the remote host certificate itself. In case of certificates signed by an intermediate CA, the certificate specified must contain the complete certificate chain (certificate bundle).

HTTPSCertFile

The path of the certificate file that will be presented to the remote host during the HTTPS handshake.

HTTPSCertKeyFile

The path of the private key file that was used to generate the certificate specified by the HTTPSCertFile directive. This is used for the HTTPS handshake.

Proxy

This optional directive is used to specify the protocol, IP address (or hostname) and port number of the HTTP or SOCKS proxy host to be used. The format is protocol://hostname:port.

Reconnect

This optional directive sets the reconnect interval in seconds. If it is set, the module attempts to reconnect in every defined second. If it is not set, the reconnect interval will start at 1 second and double with every attempt. If the duration of the successful connection is greater than the current reconnect interval, then the reconnect interval will be reset to 1 sec.

The Reconnect directive must be used with caution. If it is used on multiple systems, it can send reconnect requests simultaneously to the same destination, potentially overloading the destination system. It may also cause NXLog Agent to use unusually high system resources or cause NXLog Agent to become unresponsive.

ReconnectOnData

This optional directive defines the behavior when the connection with the remote host is lost. When set to TRUE, the module only attempts to reconnect when it has data to send. The default value is FALSE; it will always keep a connection open with the remote host.

Optional directives

Acknowledge

This optional boolean directive specifies whether an acknowledge confirmation should be sent for received messages. The default is TRUE.

PollInterval

This directive specifies how frequently the module will check for new events in seconds. If this directive is not specified, it defaults to 20 seconds.

ReadFromLast

This optional boolean directive instructs the module to only read logs that arrive after NXLog Agent is started. This directive is not normally required if the Acknowledge directive is set to TRUE (default) as the Google Cloud Pub/Sub server will manage the starting point when a connection is (re-)established. This directive comes into effect if a saved position is not found, for example on the first start, or when the SavePos directive is FALSE. When the SavePos directive is TRUE and a previously saved position is found, the module will always resume reading from the saved position. If ReadFromLast is FALSE, the module will read all the available logs. This can result in a lot of messages and is usually not the expected behavior. For this to have an effect the subscription must have the "Retain acknowledged messages" option activated. If this directive is not specified, it defaults to TRUE.

The following matrix shows the outcome of this directive in conjunction with the SavePos directive:

ReadFromLast SavePos SavedPosition Outcome

TRUE

TRUE

No

Reads events that are logged after NXLog Agent is started.

TRUE

TRUE

Yes

Reads events from the saved position.

TRUE

FALSE

No

Reads events that are logged after NXLog Agent is started.

TRUE

FALSE

Yes

Reads events that are logged after NXLog Agent is started.

FALSE

TRUE

No

Reads all events.

FALSE

TRUE

Yes

Reads events from the saved position.

FALSE

FALSE

No

Reads all events.

FALSE

FALSE

Yes

Reads all events.

SavePos

If this boolean directive is set to TRUE, the timestamp of the last read event will be saved when NXLog Agent exits. This directive is not normally required if the Acknowledge directive is set to 'TRUE' (default) as the Google Cloud Pub/Sub server will manage the starting point when a connection is (re-)established. The timestamp will be read from the cache file upon startup. For this to have an effect the subscription must have the "Retain acknowledged messages" option activated. The default is TRUE, the last timestamp will be saved if this directive is not specified. This directive affects the outcome of the ReadFromLast directive. The SavePos directive can be overridden by the global NoCache directive.

URL

Optional directive for specifying a region-specific URL. The default is https://pubsub.googleapis.com.

Fields

The following fields are used by im_googlepubsub.

$raw_event (type: string)

A list of event fields in key-value pairs.

$Attributes.* (type: string)

Attributes for this message. It contains user-defined keys.

$Data (type: string)

The message data field.

$MessageId (type: string)

ID of this message, assigned by the server when the message is published.

$OrderingKey (type: string)

If non-empty, identifies related messages for which publish order should be respected. If a Subscription has enableMessageOrdering set to true, messages published with the same non-empty orderingKey value will be delivered to subscribers in the order in which they are received by the Pub/Sub system.

$PublishTime (type: datetime)

The time at which the message was published, populated by the server when it receives the topics.publish call.

Examples

Example 1. Collecting logs from Google Pub/Sub

This configuration uses the im_googlepubsub input module to create the subscription test and collect logs from the Google Cloud project myproject.

<Input google_pubsub>
    Module             im_googlepubsub
    CredentialsFile    /path/to/credentials.json (1)
    Subscription       projects/myproject-343509/subscriptions/test
</Input>
1 Credentials file for authenticating with the Google Pub/Sub. See Configuring a Google Cloud service account for more information.