Centralized Log Collection
Centralized log collection, log aggregation, or log centralization is the process of sending event log data to a dedicated server or service for storage, and optionally for search and analytics. Storing logs on a centralized system offers several benefits over storing the data locally.
-
Event data can be accessed even if the originating server is offline, compromised, or decommissioned.
-
Data can be analyzed and correlated across more than one system.
-
It is more difficult for malicious actors to remove evidence from logs that have already been forwarded.
-
Incident investigation and auditing is easier because all event data is collected in a single location.
-
Scalable, high-availability, and redundancy solutions are easier to implement and maintain since they can be implemented at the point of the collection server.
-
Compliance with internal and external standards for log data retention only need to be to managed at a single point.
Architecture
The following diagram depicts an example of centralized log collection architecture. The single, central server collects logs from other servers, applications, and network devices. After collection, the logs can be forwarded as required for further analysis or storage.
In practice, network topology and other requirements may dictate that additional servers such as relays be added for log handling. For those cases, other functionality may be necessary than what is covered here (such as buffering).
The following diagram depicts an example of a log collection architecture where NXLog is also used as a relay which can then forward logs to any destination such as a SIEM or a log collection server. This can be especially important when separating networks is a necessity.
This chapter is concerned with the system displayed in the first diagram; collecting logs from clients to a central server.
Collection Modes
In the context of clients generating logs, NXLog supports both agent-based and agent-less log collection, and it is possible to configure a system to use both in mixed mode. In brief, these modes differ as follows (see the Log processing modes section for more details).
Agent-based log collection requires that an NXLog agent be installed on the client. With a local agent, collection is much more flexible, providing features such as filtering on the source system to send only the required data, format conversion, compression, encryption, and delivery reliability, among others. It is generally recommended that NXLog be deployed as an agent wherever possible.
With agent-based log collection, NXLog agents are installed on both the client and the central server. Here, the im_batchcompress and om_batchcompress modules are used to transport logs both compressed and encrypted. These modules preserve all the fields in the event record.
<Output batch>
Module om_batchcompress
Host 192.168.56.101
Port 2514
UseSSL TRUE
CAFile /opt/openssl_rootca/rootCA.pem
CertFile /opt/openssl_server/server.crt
CertKeyFile /opt/openssl_server/server.key
</Output>
<Input batch>
Module im_batchcompress
ListenAddr 0.0.0.0
Port 2514
CAFile /opt/openssl_rootca/rootCA.pem
CertFile /opt/openssl_server/central.crt
CertKeyFile /opt/openssl_server/central.key
</Input>
In agent-less mode, there is no NXLog agent installed on the client. Instead, the client forwards events to the central server in a native format. On the central server, NXLog accepts and parses the logs received. Often there is limited control over the log format used, and it may not be possible to implement encryption, compression, delivery reliability, or other features.
With agent-less collection, NXLog is installed on the central server but not on the client. Clients can be configured to send UDP Syslog messages to the central server using their native logging functionality. The im_udp module below could be replaced with im_tcp or im_ssl according to what protocol is supported by the clients.
UDP transport does not provide any guarantee of delivery. Network congestion or other issues may result in lost log data. |
<Extension _syslog>
Module xm_syslog
</Extension>
<Input input_udp>
Module im_udp
Host 0.0.0.0
Port 514
Exec parse_syslog();
</Input>
It is common for logs to be collected using both modes among the various clients, network devices, relays, and log servers in a network. For example, an NXLog relay may be configured to collect logs from both agents and agent-less sources and perform filtering and processing before forwarding the data to a central server.
Requirements
Various logging requirements may dictate particular details about the chosen logging architecture. The following are important considerations when deciding how to set up centralized log collection. In some cases, these requirements can only be met by using agent-based collection.
- Reliability
-
UDP does not guarantee message delivery, and should be avoided if data loss is unacceptable. Instead, TCP (and therefore TLS as well) offers guaranteed packet delivery. Furthermore, with agent-based collection NXLog can provide application-level, guaranteed delivery. See Reliable network delivery for more information.
- Structured data
-
Correlating data across multiple log sources requires parsing event data into a common set of fields. Event fields are a core part of NXLog processing, and an NXLog agent can be configured to parse events at any point along their path to the central server. Often, parsing is done as early as possible (at the source, for agent-based collection) to simplify later categorization and to reduce processing load on log servers as logs are received. See Parsing various log formats and Log classification.
- Encryption
-
To maintain confidentiality of log data, TLS can be used during transport.
- Compression
-
If bandwidth is a concern, log data compression may be desirable. Most event data is highly compressible, allowing bandwidth requirements to be reduced significantly. The im_batchcompress and om_batchcompress modules provide batched, compressed transport of log data between NXLog agents.
- Storage format
-
Normally, data should be converted to, and stored in, a common format when dealing with heterogeneous logs sources.
Data Formats
When using agent-based collection, it is often desirable to convert the data prior to transfer. In this case, structured data is often sent using one of these formats.
- Batch compression modules
-
The im_batchcompress and om_batchcompress modules can be used to send logs in compressed, and optionally encrypted, batches. All fields in the event record are preserved.
- NXLog binary format
-
NXLog has its own binary format (see Binary InputType and Binary OutputType) that retains all the fields of an event and can be used to send logs via TCP, UDP, or TLS (or with other stream-oriented modules).
- JSON
-
JSON is easy to generate and parse and has become a de facto standard for logging as well. It has some limitations, such as the missing datetime format. See the JSON section.
Agent-less collection is restricted to formats supported by the clients. The following are a few common formats, but many more are supported. See also the OS Support chapters.
- Syslog
-
Using Syslog has become a common practice and many SIEM vendors and products support (or even require) Syslog. See the Collecting, parsing, and forwarding syslog logs chapter for more details. Syslog contains free form message data that typically needs to be parsed to extract more information for further analysis. Syslog often uses UDP, TCP, or TLS for transport.
- Snare
-
The Snare format is commonly used to transport Windows Event Log, with or without Syslog headers.
- Windows Event Forwarding (WEF)
-
Windows Event Log can be forwarded over HTTPS with Windows Event Forwarding. See the Collecting logs from Windows Event Log chapter.