WebHDFS (om_webhdfs)
This module allows logs to be stored in Hadoop HDFS using the WebHDFS protocol.
Configuration
The om_webhdfs module accepts the following directives in addition to the common module directives. The File and URL directives are required.
Required directives
The following directives are required for the module to start.
This mandatory directive specifies the name of the destination file. It must be a string type expression. If the expression in the File directive is not a constant string (it contains functions, field names, or operators), it will be evaluated before each request is dispatched to the WebHDFS REST endpoint (and after the Exec is evaluated). Note that the filename must be quoted to be a valid string literal, unlike in other directives which take a filename argument. |
|
This mandatory directive specifies the URL of the WebHDFS REST endpoint where the module should POST the event data. The module operates in plain HTTP or HTTPS mode depending on the URL provided, and connects to the hostname specified in the URL. If the port number is not explicitly indicated in the URL, it defaults to port 80 for HTTP and port 443 for HTTPS. |
TLS/SSL directives
The following directives are for configuring secure data transfer via TLS/SSL.
This boolean directive specifies whether the connection should be allowed with an expired certificate.
If set to |
|||
This boolean directive specifies that the connection should be allowed regardless of the certificate verification results.
If set to |
|||
This directive specifies a path to a directory containing certificate authority (CA) certificates. These certificates will be used to verify the certificate presented by the remote server. The certificate files must be named using the OpenSSL hashed format, i.e. the hash of the certificate followed by .0, .1 etc. To find the hash of a certificate using OpenSSL:
For example, if the certificate hash is A remote server’s self-signed certificate (which is not signed by a CA) can also be trusted by including a copy of the certificate in this directory. The default operating system root certificate store will be used if this directive is not specified.
Unix-like operating systems commonly store root certificates in |
|||
This specifies the path of the certificate authority (CA) certificate that will be used to verify the certificate presented by the remote server. A remote server’s self-signed certificate (which is not signed by a CA) can be trusted by specifying the remote server certificate itself. In case of certificates signed by an intermediate CA, the certificate specified must contain the complete certificate chain (certificate bundle). |
|||
This optional directive specifies the thumbprint of the certificate authority (CA) certificate that will be used to verify the certificate presented by the remote server. The hexadecimal fingerprint string can be copied from Windows Certificate Manager (certmgr.msc). Whitespaces are automatically removed. The certificate must be added to a Windows certificate store that is accessible by NXLog. This directive is only supported on Windows and is mutually exclusive with the HTTPSCADir and HTTPSCAFile directives. |
|||
This specifies the path of the certificate file to be used for the HTTPS handshake. |
|||
This specifies the path of the private key file that was used to generate the certificate specified by the HTTPSCertFile directive. This is used for the HTTPS handshake. |
|||
This optional directive specifies the thumbprint of the certificate that will be presented to the remote server during the HTTPS handshake.
The hexadecimal fingerprint string can be copied from Windows Certificate Manager (certmgr.msc).
Whitespaces are automatically removed.
The certificate must be imported to the
This directive is only supported on Windows and is mutually exclusive with the HTTPSCertFile and HTTPSCertKeyFile directives.
|
|||
This directive specifies a path to a directory containing certificate revocation list (CRL) files. These CRL files will be used to check for certificates that were revoked and should no longer be accepted. The files must be named using the OpenSSL hashed format, i.e. the hash of the issuer followed by .r0, .r1 etc. To find the hash of the issuer of a CRL file using OpenSSL:
For example if the hash is |
|||
This specifies the path of the certificate revocation list (CRL) which will be used to check for certificates that have been revoked and should no longer be accepted. Example to generate a CRL file using OpenSSL:
|
|||
This optional directive specifies file with dh-parameters for Diffie-Hellman key exchange. These parameters can be generated with dhparam(1ssl). If no directive is specified, default parameters will be used. See OpenSSL Wiki for further details. |
|||
This directive specifies the passphrase of the private key specified by the HTTPSCertKeyFile directive. A passphrase is required when the private key is encrypted. Example to generate a private key with Triple DES encryption using OpenSSL:
This directive is not needed for passwordless private keys. |
|||
This optional boolean directive, when set to |
|||
This optional directive can be used to set the permitted SSL cipher list, overriding the default.
Use the format described in the ciphers(1ssl) man page.
For example specify
|
|||
This optional directive can be used to set the permitted cipher list for TLSv1.3. Use the same format as in the HTTPSSSLCipher directive. Refer to the OpenSSL documentation for a list of valid TLS v1.3 cipher suites. The default value is:
|
|||
This boolean directive allows you to enable data compression when sending data over the network.
The compression mechanism is based on the zlib compression library.
If the directive is not specified, it defaults to
|
|||
This directive can be used to set the allowed SSL/TLS protocol(s).
It takes a comma-separated list of values which can be any of the following: |
Optional directives
The module will send the data to the endpoint defined in URL after this amount of time in seconds, unless FlushLimit is reached first. This defaults to 5 seconds. |
|||
When the number of events in the output buffer reaches the value specified by this directive, the module will send the data to the endpoint defined in URL. This defaults to 500 events. The FlushInterval may trigger sending the write request before this limit is reached if the log volume is low to ensure that data is sent promptly. |
|||
This configuration option can be used to specify additional HTTP Query Parameters such as BlockSize. This option may be used to define more than one parameter:
|
|||
This optional directive sets the reconnect interval in seconds. If it is set, the module attempts to reconnect in every defined second. If it is not set, the reconnect interval will start at 1 second and doubles with every attempt. If the duration of the successful connection is greater than the current reconnect interval, then the reconnect interval will be reset to 1 sec.
|
Procedures
The following procedures are exported by om_webhdfs.
reconnect();
-
Force a reconnection. This can be used from a Schedule block to periodically reconnect to the server.
The reconnect() procedure must be used with caution. If configured, it can attempt to reconnect after every event sent, potentially overloading the destination system.
Examples
This example output module instance forwards messages to the specified URL and file using the WebHDFS protocol.
<Output hdfs>
Module om_webhdfs
URL http://hdfsserver.domain.com/
File "myfile"
QueryParam blocksize 42
QueryParam destination /foo
</Output>