--- title: Parsing NGINX log in Splunk excerpt: Configure regex in field extractor to create relevant fields date: 2021-12-25 tags: - splunk - nginx --- For web server's access log, Splunk has built-in support for Apache only. Splunk has a feature called field extractor. It is powered by delimiter and regex, and enables user to add new [_fields_](https://docs.splunk.com/Documentation/Splunk/8.2.3/Knowledge/Aboutfields) to be used in a search query. This post will only covers the regex patterns to parse nginx log, for instruction on field extractor, I recommend perusing the [official documentation](https://docs.splunk.com/Documentation/Splunk/8.2.3/Knowledge/ExtractfieldsinteractivelywithIFX). To illustrate, say we have a log format like this: ``` {id} "{http.request.host}" "{http.request.header.user-agent}" ``` An example log is: ``` 123 "example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0" ``` While you could search for a specific keyword, e.g. attempts of {% post_link log4shell-log4j-unbound-dns 'Log4shell exploit' %}, since there are no fields, you cannot run any statistics like [`table`](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Table) or [`stats`](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/stats) on the search results. Splunk is able to understand Apache log format because its field extractor already includes the necessary regex patterns to parse the relevant fields of each line in a log. Choosing a source type is equivalent of choosing a log format. If a format is not listed in [the default list](https://docs.splunk.com/Documentation/Splunk/8.2.3/Data/Listofpretrainedsourcetypes), we can either use an add-on or create new fields using field extractor. There is a Splunk [add-on](https://docs.splunk.com/Documentation/AddOns/latest/NGINX) for nginx and I suggest to try it before resorting to field extractor. I create five patterns which cover most of the nginx events I encountered during my work. Refer to the documentation for [supported syntax](https://docs.splunk.com/Documentation/Splunk/8.2.3/Knowledge/AboutSplunkregularexpressions). A field is extracted through "capturing group". ``` (?capture pattern) ``` For example, `(?\w+)` searches for one or more (`+`) alphanumeric characters (`\w`) and names the field as `month`. I opted for lazier matching, mostly using unbounded quantifier `+` instead of a stricter range of occurrences `{M,N}` despite knowing the exact pattern of a field. I found some fields may stray off slightly from the expected pattern, so a lazier matching tends match more events without matching unwanted's. ## Web request ### Regex ``` (?\w+)\s+(?\d+)\s(?