Parse logs in Common and Combined Log Format

The Common Log Format (or NCSA Common Log Format) and Combined Log Format are text-based log formats most commonly used for web server access logs. The two formats are almost identical, except that the Combined Log Format uses two additional fields.

Below, we provide examples of collecting and parsing these formats with NXLog Agent.

Common Log Format syntax
host ident authuser [date] "request" status size
Combined Log Format syntax
host ident authuser [date] "request" status size "referer" "user-agent"

A hyphen (-) indicates the field is not available.

Table 1. Common and Combined Log Format fields
Field Description

host

IP address of the client

ident

RFC 1413 identity of the client

authuser

The user accessing the document (not applicable for public documents)

date

Timestamp of the request

request

Request line received from the client. It includes the HTTP method, the resource requested, and the HTTP protocol version.

status

HTTP status code returned to the client

size

Size of the object returned to the client in bytes

referer †

The URI where the request originated

user-agent †

User agent string sent by the client

† Combined Log Format only.

Example 1. Parsing log events in Common Log Format

This configuration reads Apache HTTP Server access logs with the im_file input module. It then uses a regular expression to parse the $raw_event field and extract additional fields.

nxlog.conf
<Input access_log>
    Module    im_file
    File      '/var/log/apache2/access.log'
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP\/\d\.\d\"\ (\S+)\ (\S+)/ (1)
        { (2)
            $Hostname = $1;
            if $2 != '-' $AccountName = $2;
            $EventTime = parsedate($3); (3)
            $HTTPMethod = $4;
            $HTTPURL = $5;
            $HTTPResponseStatus = $6;
            if $7 != '-' $FileSize = $7;
        }
    </Exec>
</Input>
1 The regular expression that parses log events.
2 This block creates fields from the captured groups.
3 The parsedate() function converts the timestamp string to datetime.
Input sample
192.168.2.20 - - [28/Jul/2023:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

When the NXLog Agent configuration above processes this web access event, it adds the following fields to the log record in addition to the core fields.

Field Value

$EventTime

2023-07-28T15:27:10.000000+02:00

$HTTPMethod

GET

$HTTPURL

/cgi-bin/try/

$HTTPResponseStatus

200

$FileSize

3395

Example 2. Parsing log events in Combined Log Format

This configuration reads Apaches HTTP Server access logs with the im_file input module. It then uses a regular expression to parse the $raw_event field and extract additional fields.

nxlog.conf
<Input access_log>
    Module    im_file
    File      '/var/log/apache2/access.log'
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP\/\d\.\d\"\ (\S+)\ (\S+)\ \"([^\"]+)\"
                          \ \"([^\"]+)\"/ (1)
        { (2)
            $Hostname = $1;
            if $2 != '-' $AccountName = $2;
            $EventTime = parsedate($3); (3)
            $HTTPMethod = $4;
            $HTTPURL = $5;
            $HTTPResponseStatus = $6;
            if $7 != '-' $FileSize = $7;
            if $8 != '-' $HTTPReferer = $8;
            if $9 != '-' $HTTPUserAgent = $9;
        }
    </Exec>
</Input>
1 The regular expression that parses log events.
2 This block creates fields from the captured groups.
3 The parsedate() function converts the timestamp string to datetime.
Input sample
192.168.1.10 - jdoe [10/Oct/2023:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/5.0 [en] (Windows NT 10.0; Win64; x64)"

When the NXLog Agent configuration above processes this web access event, it adds the following fields to the log record in addition to the core fields. Notice the two extra fields compared to the previous example.

Field Value

$EventTime

2023-10-10T22:55:36.000000+02:00

$AccountName

jdoe

$HTTPMethod

GET

$HTTPURL

/apache_pb.gif

$HTTPResponseStatus

200

$FileSize

2326

$HTTPReferer

http://www.example.com/start.html

$HTTPUserAgent

Mozilla/5.0 [en] (Windows NT 10.0; Win64; x64)

See the Apache HTTP Server integration guide for more information and examples.