Parse logs in Common and Combined Log Format
The Common Log Format (or NCSA Common Log Format) and Combined Log Format are text-based log formats most commonly used for web server access logs. The two formats are almost identical, except that the Combined Log Format uses two additional fields.
Below, we provide examples of collecting and parsing these formats with NXLog Agent.
host ident authuser [date] "request" status size
host ident authuser [date] "request" status size "referer" "user-agent"
A hyphen (-
) indicates the field is not available.
Field | Description |
---|---|
host |
IP address of the client |
ident |
RFC 1413 identity of the client |
authuser |
The user accessing the document (not applicable for public documents) |
date |
Timestamp of the request |
request |
Request line received from the client. It includes the HTTP method, the resource requested, and the HTTP protocol version. |
status |
HTTP status code returned to the client |
size |
Size of the object returned to the client in bytes |
referer † |
The URI where the request originated |
user-agent † |
User agent string sent by the client |
† Combined Log Format only.
This configuration reads Apache HTTP Server access logs with the im_file input module.
It then uses a regular expression to parse the $raw_event
field and extract additional fields.
<Input access_log>
Module im_file
File '/var/log/apache2/access.log'
<Exec>
if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
\ HTTP\/\d\.\d\"\ (\S+)\ (\S+)/ (1)
{ (2)
$Hostname = $1;
if $2 != '-' $AccountName = $2;
$EventTime = parsedate($3); (3)
$HTTPMethod = $4;
$HTTPURL = $5;
$HTTPResponseStatus = $6;
if $7 != '-' $FileSize = $7;
}
</Exec>
</Input>
1 | The regular expression that parses log events. |
2 | This block creates fields from the captured groups. |
3 | The parsedate() function converts the timestamp string to datetime. |
192.168.2.20 - - [28/Jul/2023:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
When the NXLog Agent configuration above processes this web access event, it adds the following fields to the log record in addition to the core fields.
Field | Value |
---|---|
$EventTime |
2023-07-28T15:27:10.000000+02:00 |
$HTTPMethod |
GET |
$HTTPURL |
/cgi-bin/try/ |
$HTTPResponseStatus |
200 |
$FileSize |
3395 |
This configuration reads Apaches HTTP Server access logs with the im_file input module.
It then uses a regular expression to parse the $raw_event
field and extract additional fields.
<Input access_log>
Module im_file
File '/var/log/apache2/access.log'
<Exec>
if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
\ HTTP\/\d\.\d\"\ (\S+)\ (\S+)\ \"([^\"]+)\"
\ \"([^\"]+)\"/ (1)
{ (2)
$Hostname = $1;
if $2 != '-' $AccountName = $2;
$EventTime = parsedate($3); (3)
$HTTPMethod = $4;
$HTTPURL = $5;
$HTTPResponseStatus = $6;
if $7 != '-' $FileSize = $7;
if $8 != '-' $HTTPReferer = $8;
if $9 != '-' $HTTPUserAgent = $9;
}
</Exec>
</Input>
1 | The regular expression that parses log events. |
2 | This block creates fields from the captured groups. |
3 | The parsedate() function converts the timestamp string to datetime. |
192.168.1.10 - jdoe [10/Oct/2023:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/5.0 [en] (Windows NT 10.0; Win64; x64)"
When the NXLog Agent configuration above processes this web access event, it adds the following fields to the log record in addition to the core fields. Notice the two extra fields compared to the previous example.
Field | Value |
---|---|
$EventTime |
2023-10-10T22:55:36.000000+02:00 |
$AccountName |
jdoe |
$HTTPMethod |
GET |
$HTTPURL |
/apache_pb.gif |
$HTTPResponseStatus |
200 |
$FileSize |
2326 |
$HTTPReferer |
http://www.example.com/start.html |
$HTTPUserAgent |
Mozilla/5.0 [en] (Windows NT 10.0; Win64; x64) |
See the Apache HTTP Server integration guide for more information and examples.