11 KiB
title | excerpt | date | tags | ||
---|---|---|---|---|---|
Parsing NGINX log in Splunk | Configure regex in field extractor to create relevant fields | 2021-12-25 |
|
For web server's access log, Splunk has built-in support for Apache only. Splunk has a feature called field extractor. It is powered by delimiter and regex, and enables user to add new fields to be used in a search query. This post will only covers the regex patterns to parse nginx log, for instruction on field extractor, I recommend perusing the official documentation.
To illustrate, say we have a log format like this:
{id} "{http.request.host}" "{http.request.header.user-agent}"
An example log is:
123 "example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"
While you could search for a specific keyword, e.g. attempts of {% post_link log4shell-log4j-unbound-dns 'Log4shell exploit' %}, since there are no fields, you cannot run any statistics like table
or stats
on the search results.
Splunk is able to understand Apache log format because its field extractor already includes the necessary regex patterns to parse the relevant fields of each line in a log. Choosing a source type is equivalent of choosing a log format. If a format is not listed in the default list, we can either use an add-on or create new fields using field extractor. There is a Splunk add-on for nginx and I suggest to try it before resorting to field extractor.
I create five patterns which cover most of the nginx events I encountered during my work. Refer to the documentation for supported syntax.
A field is extracted through "capturing group".
(?<field_name>capture pattern)
For example, (?<month>\w+)
searches for one or more (+
) alphanumeric characters (\w
) and names the field as month
. I opted for lazier matching, mostly using unbounded quantifier +
instead of a stricter range of occurrences {M,N}
despite knowing the exact pattern of a field. I found some fields may stray off slightly from the expected pattern, so a lazier matching tends match more events without matching unwanted's.
Web request
Regex
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<remote_ip>[\d\.]+)(?:\s\d+\s\S+\s\S+\s)\[(?<time_local>\S+)\s(?<timezone>\+\d{4})\]\s"(?<http_method>\w+)\s(?<http_path>.+)\s(?<http_version>HTTP/\d\.\d)"\s(?<http_status>\d{3})\s(?:\d+)\s"(?<request_url>.[^"]*)"\s"(?<http_user_agent>.[^"]*)"\s(?<server_ip>[\d\.]+)\:(?<server_port>\d+)(?:\s\d+\s\d+\s)(?<ssl_version>\S+)\s(?<ssl_cipher>\S+)\s(?<http_cookie>\S+)
Event
Dec 24 01:23:45 192.168.0.2 nginx: 1.2.3.4 55763 - - [24/Dec/2021:01:23:45 +0000] "GET /page.html HTTP/2.0" 200 494 "https://www.example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0" 192.168.1.2:8080 123 4 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 abcdef .
Fields
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) |
One or more alphanumeric |
day | 24 | (?<day>\d+) |
One or more digit |
time | 01:23:45 | (?<time>[\d\:]+) |
One or more digit or semicolon |
proxy_ip | 192.168.0.2 | (?<proxy_ip>[\d\.]+) |
One or more digit or dot |
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) |
|
time_local | 24/Dec/2021:01:23:45 | (?<time_local>\S+) |
One or more non-whitespace characters |
timezone | +0000 | (?<timezone>[\+\-]\d{4}) |
Four digits with plus or minus prefix |
http_method | GET | (?<http_method>\w+) |
|
http_path | /page.html | (?<http_path>.+) |
One or more of any character |
http_version | HTTP/2.0 | (?<http_version>HTTP/\d\.\d) |
"HTTP", a digit, dot and digit |
http_status | 200 | (?<http_status>\d{3}) |
Three digits |
request_url | https://www.example.com | (?<request_url>.[^"]*) |
Zero or more of any character except double quote |
http_user_agent | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0 | (?<http_user_agent>.[^"]*) |
|
server_ip | 192.168.1.2 | (?<server_ip>[\d\.]+) |
|
server_port | 8080 | (?<server_port>\d+) |
|
ssl_version | TLSv1.2 | (?<ssl_version>\S+) |
|
ssl_cipher | ECDHE-RSA-AES128-GCM-SHA256 | (?<ssl_cipher>\S+) |
|
http_cookie | abcdef | (?<http_cookie>\S+) |
nginx is configured as a reverse proxy, proxy_ip
is its ip whereas server_ip
is the upstream's.
Proxy request
Regex
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\sclient\s)(?<remote_ip>[\d\.]+)\:(?<remote_port>\d+)(?:\sconnected\sto\s)(?<server_ip>[\d\.]+)\:(?<server_port>\d+)
Event
Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 1776#1776:*114333142 client 1.2.3.4:19802 connected to 192.168.1.2:8080
Fields
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) |
|
day | 24 | (?<day>\d+) |
|
time | 01:23:45 | (?<time>[\d\:]+) |
|
proxy_ip | 192.168.0.2 | (?<proxy_ip>[\d\.]+) |
|
year | 2021 | (?<year>\d{4}) |
|
nmonth | 12 | (?<nmonth>\d{2}) |
|
log_level | info | (?<log_level>\w+) |
|
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) |
|
remote_port | 19802 | (?<remote_port>\d+) |
|
server_ip | 192.168.1.2 | (?<server_ip>[\d\.]+) |
|
server_port | 8080 | (?<server_port>\d+) |
Upstream error response
Regex
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>.[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)",\shost\:\s"(?<upstream_host>.[^"]*)
Event
Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [error] 1776#1776:*71197740 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "POST /api/path HTTP/2.0",upstream: "http://192.168.1.2:8080/api/path", host:"example.com"
Fields
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) |
|
day | 24 | (?<day>\d+) |
|
time | 01:23:45 | (?<time>[\d\:]+) |
|
proxy_ip | 192.168.0.2 | (?<remote_ip>[\d\.]+) |
|
year | 2021 | (?<year>\d{4}) |
|
nmonth | 12 | (?<nmonth>\d{2}) |
|
log_level | error | (?<log_level>\w+) |
|
upstream_error | upstream timed out (110: Connection timed out) while reading response header from upstream | (?<upstream_error>.[^,]*) |
Zero or more of any character except comma |
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) |
|
server_host | example.com | (?<server_host>.[^,]*) |
|
http_method | POST | (?<http_method>\w+) |
|
http_path | /api/path | (?<http_path>\S+) |
|
http_version | HTTP/2.0 | (?<http_version>HTTP/\d\.\d) |
|
upstream_url | http://192.168.1.2:8080/api/path | (?<upstream_url>.[^"]*) |
|
upstream_host | example.com | (?<upstream_host>.[^"]*) |
Upstream epoll error
Regex
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>[^,]*,[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)(?:",\shost\:\s")(?<upstream_host>.[^"]*)
Event
Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 13199#13199: *81574833 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream, client: 1.2.3.4, server: example.com, request: "GET /page.html HTTP/1.1", upstream:"http://192.168.1.2/page.html", host: "example.com"
Fields
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) |
|
day | 24 | (?<day>\d+) |
|
time | 01:23:45 | (?<time>[\d\:]+) |
|
proxy_ip | 192.168.0.2 | (?<remote_ip>[\d\.]+) |
|
year | 2021 | (?<year>\d{4}) |
|
nmonth | 12 | (?<nmonth>\d{2}) |
|
log_level | info | (?<log_level>\w+) |
|
upstream_error | epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream | (?<upstream_error>.[^,]*) |
|
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) |
|
server_host | example.com | (?<server_host>.[^,]*) |
|
http_method | GET | (?<http_method>\w+) |
|
http_path | /page.html | (?<http_path>\S+) |
|
http_version | HTTP/1.1 | (?<http_version>HTTP/\d\.\d) |
|
upstream_url | http://192.168.1.2/page.html | (?<upstream_url>.[^"]*) |
|
upstream_host | example.com | (?<upstream_host>.[^"]*) |
Upstream epoll error with referrer
Regex
(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>[^,]*,[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)(?:",\shost\:\s")(?<upstream_host>.[^"]*)(?:",\sreferrer\:\s")(?<referrer>.[^"]*)
Event
Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 1776#1776:*71220252 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 1.2.3.4, server: example.com, request: "GET /page.html HTTP/1.1", upstream: "http://192.168.1.2:8080/page.html", host: "example.com", referrer: "https://example.com"
Fields
Field | Value | Regex | Explanation |
---|---|---|---|
month | Dec | (?<month>\w+) |
|
day | 24 | (?<day>\d+) |
|
time | 01:23:45 | (?<time>[\d\:]+) |
|
proxy_ip | 192.168.0.2 | (?<remote_ip>[\d\.]+) |
|
year | 2021 | (?<year>\d{4}) |
|
nmonth | 12 | (?<nmonth>\d{2}) |
|
log_level | info | (?<log_level>\w+) |
|
upstream_error | epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream | (?<upstream_error>.[^,]*) |
|
remote_ip | 1.2.3.4 | (?<remote_ip>[\d\.]+) |
|
server_host | example.com | (?<server_host>.[^,]*) |
|
http_method | GET | (?<http_method>\w+) |
|
http_path | /page.html | (?<http_path>\S+) |
|
http_version | HTTP/1.1 | (?<http_version>HTTP/\d\.\d) |
|
upstream_url | http://192.168.1.2:8080/page.html | (?<upstream_url>.[^"]*) |
|
upstream_host | example.com | (?<upstream_host>.[^"]*) |
|
referrer | https://example.com | (?<referrer>.[^"]*) |