diff --git a/source/_posts/nginx-splunk-field-extractor.md b/source/_posts/nginx-splunk-field-extractor.md
new file mode 100644
index 0000000..820c04d
--- /dev/null
+++ b/source/_posts/nginx-splunk-field-extractor.md
@@ -0,0 +1,208 @@
+---
+title: Parsing NGINX log in Splunk
+excerpt: Configure regex in field extractor to create relevant fields
+date: 2021-12-25
+tags:
+- splunk
+- nginx
+---
+
+For web server's access log, Splunk has built-in support for Apache only. Splunk has a feature called field extractor. It is powered by delimiter and regex, and enables user to add new [_fields_](https://docs.splunk.com/Documentation/Splunk/8.2.3/Knowledge/Aboutfields) to be used in a search query. This post will only covers the regex patterns to parse nginx log, for instruction on field extractor, I recommend perusing the [official documentation](https://docs.splunk.com/Documentation/Splunk/8.2.3/Knowledge/ExtractfieldsinteractivelywithIFX).
+
+To illustrate, say we have a log format like this:
+
+```
+{id} "{http.request.host}" "{http.request.header.user-agent}"
+```
+
+An example log is:
+
+```
+123 "example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"
+```
+
+While you could search for a specific keyword, e.g. attempts of {% post_link log4shell-log4j-unbound-dns 'Log4shell exploit' %}, since there are no fields, you cannot run any statistics like [`table`](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Table) or [`stats`](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/stats) on the search results.
+
+Splunk is able to understand Apache log format because its field extractor already includes the necessary regex patterns to parse the relevant fields of each line in a log. Choosing a source type is equivalent of choosing a log format. If a format is not listed in [the default list](https://docs.splunk.com/Documentation/Splunk/8.2.3/Data/Listofpretrainedsourcetypes), we can either use an add-on or create new fields using field extractor. There is a Splunk [add-on](https://docs.splunk.com/Documentation/AddOns/latest/NGINX) for nginx and I suggest to try it before resorting to field extractor.
+
+I create five patterns which cover most of the nginx events I encountered during my work. Refer to the documentation for [supported syntax](https://docs.splunk.com/Documentation/Splunk/8.2.3/Knowledge/AboutSplunkregularexpressions).
+
+A field is extracted through "capturing group".
+
+```
+(?<field_name>capture pattern)
+```
+
+For example, `(?<month>\w+)` searches for one or more (`+`) alphanumeric characters (`\w`) and names the field as `month`. I opted for lazier matching, mostly using unbounded quantifier `+` instead of a stricter range of occurrences `{M,N}` despite knowing the exact pattern of a field. I found some fields may stray off slightly from the expected pattern, so a lazier matching tends match more events without matching unwanted's.
+
+## Web request
+
+### Regex
+
+```
+(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<remote_ip>[\d\.]+)(?:\s\d+\s\S+\s\S+\s)\[(?<time_local>\S+)\s(?<timezone>\+\d{4})\]\s"(?<http_method>\w+)\s(?<http_path>.+)\s(?<http_version>HTTP/\d\.\d)"\s(?<http_status>\d{3})\s(?:\d+)\s"(?<request_url>.[^"]*)"\s"(?<http_user_agent>.[^"]*)"\s(?<server_ip>[\d\.]+)\:(?<server_port>\d+)(?:\s\d+\s\d+\s)(?<ssl_version>\S+)\s(?<ssl_cipher>\S+)\s(?<http_cookie>\S+)
+```
+
+### Event
+
+```
+Dec 24 01:23:45 192.168.0.2 nginx: 1.2.3.4 55763 - - [24/Dec/2021:01:23:45 +0000] "GET /page.html HTTP/2.0" 200 494 "https://www.example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0" 192.168.1.2:8080 123 4 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 abcdef .
+```
+
+### Fields
+
+Field | Value | Regex | Explanation
+--- | --- | --- | ---
+month | Dec | `(?<month>\w+)` | One or more alphanumeric
+day | 24 | `(?<day>\d+)` | One or more digit
+time | 01:23:45 | `(?<time>[\d\:]+)` | One or more digit or semicolon
+proxy_ip | 192.168.0.2 | `(?<proxy_ip>[\d\.]+)` | One or more digit or dot
+remote_ip | 1.2.3.4 | `(?<remote_ip>[\d\.]+)` |
+time_local | 24/Dec/2021:01:23:45 | `(?<time_local>\S+)` | One or more non-whitespace characters
+timezone | +0000 | `(?<timezone>[\+\-]\d{4})` | Four digits with plus or minus prefix
+http_method | GET | `(?<http_method>\w+)` |
+http_path | /page.html | `(?<http_path>.+)` | One or more of any character
+http_version | HTTP/2.0 | `(?<http_version>HTTP/\d\.\d)` | "HTTP", a digit, dot and digit
+http_status | 200 | `(?<http_status>\d{3})` | Three digits
+request_url | https://www.example.com | `(?<request_url>.[^"]*)` | Zero or more of any character except double quote
+http_user_agent | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0 | `(?<http_user_agent>.[^"]*)` |
+server_ip | 192.168.1.2 | `(?<server_ip>[\d\.]+)` |
+server_port | 8080 | `(?<server_port>\d+)` |
+ssl_version | TLSv1.2 | `(?<ssl_version>\S+)` |
+ssl_cipher | ECDHE-RSA-AES128-GCM-SHA256 | `(?<ssl_cipher>\S+)` |
+http_cookie | abcdef | `(?<http_cookie>\S+)` |
+
+nginx is configured as a reverse proxy, `proxy_ip` is its ip whereas `server_ip` is the upstream's.
+
+## Proxy request
+
+### Regex
+
+```
+(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\sclient\s)(?<remote_ip>[\d\.]+)\:(?<remote_port>\d+)(?:\sconnected\sto\s)(?<server_ip>[\d\.]+)\:(?<server_port>\d+)
+```
+
+### Event
+
+```
+Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 1776#1776:*114333142 client 1.2.3.4:19802 connected to 192.168.1.2:8080
+```
+
+### Fields
+
+Field | Value | Regex | Explanation
+--- | --- | --- | ---
+month | Dec | `(?<month>\w+)` |
+day | 24 | `(?<day>\d+)` |
+time | 01:23:45 | `(?<time>[\d\:]+)` |
+proxy_ip | 192.168.0.2 | `(?<proxy_ip>[\d\.]+)` |
+year | 2021 | `(?<year>\d{4})` |
+nmonth | 12 | `(?<nmonth>\d{2})` |
+log_level | info | `(?<log_level>\w+)` |
+remote_ip | 1.2.3.4 | `(?<remote_ip>[\d\.]+)` |
+remote_port | 19802 | `(?<remote_port>\d+)` |
+server_ip | 192.168.1.2 | `(?<server_ip>[\d\.]+)` |
+server_port | 8080 | `(?<server_port>\d+)` |
+
+## Upstream error response
+
+### Regex
+
+```
+(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>.[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)",\shost\:\s"(?<upstream_host>.[^"]*)
+```
+
+### Event
+
+```
+Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [error] 1776#1776:*71197740 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 1.2.3.4, server: example.com, request: "POST /api/path HTTP/2.0",upstream: "http://192.168.1.2:8080/api/path", host:"example.com"
+```
+
+### Fields
+
+Field | Value | Regex | Explanation
+--- | --- | --- | ---
+month | Dec | `(?<month>\w+)` |
+day | 24 | `(?<day>\d+)` |
+time | 01:23:45 | `(?<time>[\d\:]+)` |
+proxy_ip | 192.168.0.2 | `(?<remote_ip>[\d\.]+)` |
+year | 2021 | `(?<year>\d{4})` |
+nmonth | 12 | `(?<nmonth>\d{2})` |
+log_level | error | `(?<log_level>\w+)` |
+upstream_error | upstream timed out (110: Connection timed out) while reading response header from upstream | `(?<upstream_error>.[^,]*)` | Zero or more of any character except comma
+remote_ip | 1.2.3.4 | `(?<remote_ip>[\d\.]+)` |
+server_host | example.com | `(?<server_host>.[^,]*)` |
+http_method | POST | `(?<http_method>\w+)` |
+http_path | /api/path | `(?<http_path>\S+)` |
+http_version | HTTP/2.0 | `(?<http_version>HTTP/\d\.\d)` |
+upstream_url | http://192.168.1.2:8080/api/path | `(?<upstream_url>.[^"]*)` |
+upstream_host | example.com | `(?<upstream_host>.[^"]*)` |
+
+## Upstream epoll error
+
+### Regex
+
+```
+(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>[^,]*,[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)(?:",\shost\:\s")(?<upstream_host>.[^"]*)
+```
+
+### Event
+
+```
+Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 13199#13199: *81574833 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream, client: 1.2.3.4, server: example.com, request: "GET /page.html HTTP/1.1", upstream:"http://192.168.1.2/page.html", host: "example.com"
+```
+
+### Fields
+
+Field | Value | Regex | Explanation
+--- | --- | --- | ---
+month | Dec | `(?<month>\w+)` |
+day | 24 | `(?<day>\d+)` |
+time | 01:23:45 | `(?<time>[\d\:]+)` |
+proxy_ip | 192.168.0.2 | `(?<remote_ip>[\d\.]+)` |
+year | 2021 | `(?<year>\d{4})` |
+nmonth | 12 | `(?<nmonth>\d{2})` |
+log_level | info | `(?<log_level>\w+)` |
+upstream_error | epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream | `(?<upstream_error>.[^,]*)` |
+remote_ip | 1.2.3.4 | `(?<remote_ip>[\d\.]+)` |
+server_host | example.com | `(?<server_host>.[^,]*)` |
+http_method | GET | `(?<http_method>\w+)` |
+http_path | /page.html | `(?<http_path>\S+)` |
+http_version | HTTP/1.1 | `(?<http_version>HTTP/\d\.\d)` |
+upstream_url | http://192.168.1.2/page.html | `(?<upstream_url>.[^"]*)` |
+upstream_host | example.com | `(?<upstream_host>.[^"]*)` |
+
+## Upstream epoll error with referrer
+
+### Regex
+
+```
+(?<month>\w+)\s+(?<day>\d+)\s(?<time>[\d\:]+)\s(?<proxy_ip>[\d\.]+)(?:\snginx\:\s)(?<year>\d{4})\/(?<nmonth>\d{2})(?:\/\d{2}\s[\d\:]+\s)\[(?<log_level>\w+)\](?:\s\d+#\d+\:\s\*\d+\s)(?<upstream_error>[^,]*,[^,]*)(?:,\sclient\:\s)(?<remote_ip>[\d\.]+)(?:,\sserver\:\s)(?<server_host>.[^,]*)(?:,\srequest\:\s")(?<http_method>\w+)\s(?<http_path>\S+)\s(?<http_version>HTTP/\d\.\d)(?:",\supstream\:\s")(?<upstream_url>.[^"]*)(?:",\shost\:\s")(?<upstream_host>.[^"]*)(?:",\sreferrer\:\s")(?<referrer>.[^"]*)
+```
+
+### Event
+
+```
+Dec 24 01:23:45 192.168.0.2 nginx: 2021/12/24 01:23:45 [info] 1776#1776:*71220252 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 1.2.3.4, server: example.com, request: "GET /page.html HTTP/1.1", upstream: "http://192.168.1.2:8080/page.html", host: "example.com", referrer: "https://example.com"
+```
+
+### Fields
+
+Field | Value | Regex | Explanation
+--- | --- | --- | ---
+month | Dec | `(?<month>\w+)` |
+day | 24 | `(?<day>\d+)` |
+time | 01:23:45 | `(?<time>[\d\:]+)` |
+proxy_ip | 192.168.0.2 | `(?<remote_ip>[\d\.]+)` |
+year | 2021 | `(?<year>\d{4})` |
+nmonth | 12 | `(?<nmonth>\d{2})` |
+log_level | info | `(?<log_level>\w+)` |
+upstream_error | epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream | `(?<upstream_error>.[^,]*)` |
+remote_ip | 1.2.3.4 | `(?<remote_ip>[\d\.]+)` |
+server_host | example.com | `(?<server_host>.[^,]*)` |
+http_method | GET | `(?<http_method>\w+)` |
+http_path | /page.html | `(?<http_path>\S+)` |
+http_version | HTTP/1.1 | `(?<http_version>HTTP/\d\.\d)` |
+upstream_url | http://192.168.1.2:8080/page.html | `(?<upstream_url>.[^"]*)` |
+upstream_host | example.com | `(?<upstream_host>.[^"]*)` |
+referrer | https://example.com | `(?<referrer>.[^"]*)` |