blog/source/_posts/json-splunk-uf.md

---
title: Configure Splunk Universal Forwarder to ingest JSON files
excerpt: Parse single-line JSON into separate events
date: 2023-06-17
updated: 2023-08-13
tags:
  - splunk
---

The recommended logging format according to [Splunk best practice](https://dev.splunk.com/enterprise/docs/developapps/addsupport/logging/loggingbestpractices/#Use-developer-friendly-formats) looks like this:

```json example.log
{ "datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3" }
{ "datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3" }
{ "datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3" }
```

- Each **event** is in JSON, not the file.
  - This also means the log file is not a valid JSON file.
- Each event is separated by newline.

The format can be achieved by exporting live event in JSON and append to a log file. However, I encountered a situation where the log file can only be generated by batch. Exporting the equivalent of the previous "example.log" in JSON without string manipulation looks like this:

```json example.json
[{"datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3"}]
```

I will detail the required configurations in this post, so that Splunk is able to parse it correctly even though "example.json" is a valid JSON file.

## App-specific inputs.conf

```conf $SPLUNK_HOME/etc/deployment-apps/foo/local/inputs.conf
[monitor:///var/log/app_a]
disabled = 0
index = index_name
sourcetype = app_a_event
```

[**monitor**](https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf#MONITOR:) directive is made up of two parts: `monitor://` and the path, e.g. `/var/log/app_a`. Unlike most Splunk configs, this directive does't require the backslash (used in Windows path) to be escaped, e.g. `monitor://C:\foo\bar`.

A path can be a file or a folder. When (\*) wildcard matching is used to match multiple folders, another wildcard needs to be specified again to match files in those matched folders. The wildcard works for a single path segment only. For example, to match all the following files, use `monitor:///var/log/app_*/*`. Splunk also supports "..." for recursive matching.

```
/var/log/
├── app_a
│   ├── 1.log
│   ├── 2.log
│   └── 3.log
├── app_b
│   ├── 1.log
│   ├── 2.log
│   └── 3.log
└── app_c
    ├── 1.log
    ├── 2.log
    └── 3.log
```

Specify an appropriate value in **sourcetype** config, the value will be the value of `sourcetype` field in the ingested events under the "monitor" directive. Take note of the value you have configured, it will be used in the rest of configurations.

## App-specific props.conf

```conf $SPLUNK_HOME/etc/deployment-apps/foo/local/props.conf
[app_a_event]
description = App A logs
disabled = 0
INDEXED_EXTRACTIONS = JSON
# remove bracket at the start and end of each line
SEDCMD-a = s/^\[//g
SEDCMD-b = s/\]$//g
# separate each object into a line
LINE_BREAKER = }(,){\"datetime\"
# a line represents an event
SHOULD_LINEMERGE = 0
TIMESTAMP_FIELDS = datetime
## default is 2000
# MAX_DAYS_AGO = 3560
# TIME_FORMAT = %s
```

The directive name should be the **sourcetype** value specified in the [inputs.conf](#App-specific-inputs-conf).

- SEDCMD: [sed](https://tldr.inbrowser.app/pages/common/sed) script, `SEDCMD-<name>` can be specified multiple times to run different scripts, each with different name.
  - `s/^\[//g` removes "[" at the start of each line.
  - `s/\]$//g` removes "]" at the end of each line.
- LINE_BREAKER: Search for string that matches the regex and replace only the capturing group with newline (\n). This is to separate each event into separate line.
  - `}(,){\"datetime\"` searches for `},{"datetime"` and replaces "," with "\n".
- SHOULD_LINEMERGE: only used for event that spans multiple lines. In this case, it's the reverse, the log file has all events in one line.
- TIMESTAMP_FIELDS: Refers to `datetime` key in the `example.json`.
- MAX_DAYS_AGO (optional): Specify the value if there are events older than 2,000 days.
- TIME_FORMAT: Optional if Unix time is used. When Unix time is used, it is not necessary to specify `%s%3N` when there is [subsecond](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables).

## System props.conf

```conf $SPLUNK_HOME/etc/system/local/props.conf
[app_a_event]
description = App A logs
disabled = 0
KV_MODE = none
AUTO_KV_JSON = 0
SHOULD_LINEMERGE = 0
# MAX_DAYS_AGO = 3560
```

For Splunk Cloud deployment, the above configuration can only be added through Splunk Web: **Settings > [Source types](https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Managesourcetypes)**.
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			`---`
			`title: Configure Splunk Universal Forwarder to ingest JSON files`
			`excerpt: Parse single-line JSON into separate events`
			`date: 2023-06-17`
post(json-splunk-uf): source type can be configured through Splunk Web especially for Splunk Cloud that provides no access to props.conf 2023-08-13 07:54:21 +00:00			`updated: 2023-08-13`
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			`tags:`
			`- splunk`
			`---`

			`The recommended logging format according to [Splunk best practice](https://dev.splunk.com/enterprise/docs/developapps/addsupport/logging/loggingbestpractices/#Use-developer-friendly-formats) looks like this:`

			```json example.log
			`{ "datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3" }`
			`{ "datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3" }`
			`{ "datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3" }`
			```

			`- Each event is in JSON, not the file.`
			`- This also means the log file is not a valid JSON file.`
			`- Each event is separated by newline.`

			`The format can be achieved by exporting live event in JSON and append to a log file. However, I encountered a situation where the log file can only be generated by batch. Exporting the equivalent of the previous "example.log" in JSON without string manipulation looks like this:`

			```json example.json
			`[{"datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3"}]`
			```

			`I will detail the required configurations in this post, so that Splunk is able to parse it correctly even though "example.json" is a valid JSON file.`

			`## App-specific inputs.conf`

			```conf $SPLUNK_HOME/etc/deployment-apps/foo/local/inputs.conf
			`[monitor:///var/log/app_a]`
			`disabled = 0`
			`index = index_name`
			`sourcetype = app_a_event`
			```

			[monitor](https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf#MONITOR:) directive is made up of two parts: `monitor://` and the path, e.g. `/var/log/app_a`. Unlike most Splunk configs, this directive does't require the backslash (used in Windows path) to be escaped, e.g. `monitor://C:\foo\bar`.

			A path can be a file or a folder. When (\) wildcard matching is used to match multiple folders, another wildcard needs to be specified again to match files in those matched folders. The wildcard works for a single path segment only. For example, to match all the following files, use `monitor:///var/log/app_/*`. Splunk also supports "..." for recursive matching.

			```
			`/var/log/`
			`├── app_a`
			`│ ├── 1.log`
			`│ ├── 2.log`
			`│ └── 3.log`
			`├── app_b`
			`│ ├── 1.log`
			`│ ├── 2.log`
			`│ └── 3.log`
			`└── app_c`
			`├── 1.log`
			`├── 2.log`
			`└── 3.log`
			```

			Specify an appropriate value in sourcetype config, the value will be the value of `sourcetype` field in the ingested events under the "monitor" directive. Take note of the value you have configured, it will be used in the rest of configurations.

			`## App-specific props.conf`

			```conf $SPLUNK_HOME/etc/deployment-apps/foo/local/props.conf
			`[app_a_event]`
			`description = App A logs`
			`disabled = 0`
			`INDEXED_EXTRACTIONS = JSON`
			`# remove bracket at the start and end of each line`
			`SEDCMD-a = s/^\[//g`
			`SEDCMD-b = s/\]$//g`
			`# separate each object into a line`
			`LINE_BREAKER = }(,){\"datetime\"`
			`# a line represents an event`
			`SHOULD_LINEMERGE = 0`
			`TIMESTAMP_FIELDS = datetime`
			`## default is 2000`
			`# MAX_DAYS_AGO = 3560`
			`# TIME_FORMAT = %s`
			```

			`The directive name should be the sourcetype value specified in the [inputs.conf](#App-specific-inputs-conf).`

			- SEDCMD: [sed](https://tldr.inbrowser.app/pages/common/sed) script, `SEDCMD-<name>` can be specified multiple times to run different scripts, each with different name.
			- `s/^\[//g` removes "[" at the start of each line.
			- `s/\]$//g` removes "]" at the end of each line.
			`- LINE_BREAKER: Search for string that matches the regex and replace only the capturing group with newline (\n). This is to separate each event into separate line.`
			- `}(,){\"datetime\"` searches for `},{"datetime"` and replaces "," with "\n".
			`- SHOULD_LINEMERGE: only used for event that spans multiple lines. In this case, it's the reverse, the log file has all events in one line.`
			- TIMESTAMP_FIELDS: Refers to `datetime` key in the `example.json`.
			`- MAX_DAYS_AGO (optional): Specify the value if there are events older than 2,000 days.`
			- TIME_FORMAT: Optional if Unix time is used. When Unix time is used, it is not necessary to specify `%s%3N` when there is [subsecond](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables).

			`## System props.conf`

			```conf $SPLUNK_HOME/etc/system/local/props.conf
			`[app_a_event]`
			`description = App A logs`
			`disabled = 0`
			`KV_MODE = none`
			`AUTO_KV_JSON = 0`
			`SHOULD_LINEMERGE = 0`
			`# MAX_DAYS_AGO = 3560`
			```
post(json-splunk-uf): source type can be configured through Splunk Web especially for Splunk Cloud that provides no access to props.conf 2023-08-13 07:54:21 +00:00
			`For Splunk Cloud deployment, the above configuration can only be added through Splunk Web: Settings > [Source types](https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Managesourcetypes).`