From d47131e6b733839be65bc19b01bb4ff7fa10dc09 Mon Sep 17 00:00:00 2001 From: Ming Di Leom <2809763-curben@users.noreply.gitlab.com> Date: Sat, 17 Jun 2023 11:09:52 +0000 Subject: [PATCH] post: Configure Splunk Universal Forwarder to ingest JSON files --- source/_posts/json-splunk-uf.md | 102 ++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 source/_posts/json-splunk-uf.md diff --git a/source/_posts/json-splunk-uf.md b/source/_posts/json-splunk-uf.md new file mode 100644 index 0000000..e66caef --- /dev/null +++ b/source/_posts/json-splunk-uf.md @@ -0,0 +1,102 @@ +--- +title: Configure Splunk Universal Forwarder to ingest JSON files +excerpt: Parse single-line JSON into separate events +date: 2023-06-17 +tags: + - splunk +--- + +The recommended logging format according to [Splunk best practice](https://dev.splunk.com/enterprise/docs/developapps/addsupport/logging/loggingbestpractices/#Use-developer-friendly-formats) looks like this: + +```json example.log +{ "datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3" } +{ "datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3" } +{ "datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3" } +``` + +- Each **event** is in JSON, not the file. + - This also means the log file is not a valid JSON file. +- Each event is separated by newline. + +The format can be achieved by exporting live event in JSON and append to a log file. However, I encountered a situation where the log file can only be generated by batch. Exporting the equivalent of the previous "example.log" in JSON without string manipulation looks like this: + +```json example.json +[{"datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3"}] +``` + +I will detail the required configurations in this post, so that Splunk is able to parse it correctly even though "example.json" is a valid JSON file. + +## App-specific inputs.conf + +```conf $SPLUNK_HOME/etc/deployment-apps/foo/local/inputs.conf +[monitor:///var/log/app_a] +disabled = 0 +index = index_name +sourcetype = app_a_event +``` + +[**monitor**](https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf#MONITOR:) directive is made up of two parts: `monitor://` and the path, e.g. `/var/log/app_a`. Unlike most Splunk configs, this directive does't require the backslash (used in Windows path) to be escaped, e.g. `monitor://C:\foo\bar`. + +A path can be a file or a folder. When (\*) wildcard matching is used to match multiple folders, another wildcard needs to be specified again to match files in those matched folders. The wildcard works for a single path segment only. For example, to match all the following files, use `monitor:///var/log/app_*/*`. Splunk also supports "..." for recursive matching. + +``` +/var/log/ +├── app_a +│   ├── 1.log +│   ├── 2.log +│   └── 3.log +├── app_b +│   ├── 1.log +│   ├── 2.log +│   └── 3.log +└── app_c + ├── 1.log + ├── 2.log + └── 3.log +``` + +Specify an appropriate value in **sourcetype** config, the value will be the value of `sourcetype` field in the ingested events under the "monitor" directive. Take note of the value you have configured, it will be used in the rest of configurations. + +## App-specific props.conf + +```conf $SPLUNK_HOME/etc/deployment-apps/foo/local/props.conf +[app_a_event] +description = App A logs +disabled = 0 +INDEXED_EXTRACTIONS = JSON +# remove bracket at the start and end of each line +SEDCMD-a = s/^\[//g +SEDCMD-b = s/\]$//g +# separate each object into a line +LINE_BREAKER = }(,){\"datetime\" +# a line represents an event +SHOULD_LINEMERGE = 0 +TIMESTAMP_FIELDS = datetime +## default is 2000 +# MAX_DAYS_AGO = 3560 +# TIME_FORMAT = %s +``` + +The directive name should be the **sourcetype** value specified in the [inputs.conf](#App-specific-inputs-conf). + +- SEDCMD: [sed](https://tldr.inbrowser.app/pages/common/sed) script, `SEDCMD-` can be specified multiple times to run different scripts, each with different name. + - `s/^\[//g` removes "[" at the start of each line. + - `s/\]$//g` removes "]" at the end of each line. +- LINE_BREAKER: Search for string that matches the regex and replace only the capturing group with newline (\n). This is to separate each event into separate line. + - `}(,){\"datetime\"` searches for `},{"datetime"` and replaces "," with "\n". +- SHOULD_LINEMERGE: only used for event that spans multiple lines. In this case, it's the reverse, the log file has all events in one line. +- TIMESTAMP_FIELDS: Refers to `datetime` key in the `example.json`. +- MAX_DAYS_AGO (optional): Specify the value if there are events older than 2,000 days. +- TIME_FORMAT: Optional if Unix time is used. When Unix time is used, it is not necessary to specify `%s%3N` when there is [subsecond](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables). + +## System props.conf + +```conf $SPLUNK_HOME/etc/system/local/props.conf +[app_a_event] +description = App A logs +disabled = 0 +KV_MODE = none +AUTO_KV_JSON = 0 +SHOULD_LINEMERGE = 0 +# MAX_DAYS_AGO = 3560 +```