blog/source/_posts/json-splunk-uf.md

---
title: Configure Splunk Universal Forwarder to ingest JSON files
excerpt: Parse single-line JSON into separate events
date: 2023-06-17
updated: 2024-01-05
tags:
  - splunk
---

The recommended logging format according to [Splunk best practice](https://dev.splunk.com/enterprise/docs/developapps/addsupport/logging/loggingbestpractices/#Use-developer-friendly-formats) looks like this:

```json example.log
{ "datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3" }
{ "datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3" }
{ "datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3" }
```

- Each **event** is in JSON, not the file.
  - This also means the log file is not a valid JSON file.
- Each event is separated by newline.

The format can be achieved by exporting live event in JSON and append to a log file. However, I encountered a situation where the log file can only be generated by batch. Exporting the equivalent of the previous "example.log" in JSON without string manipulation looks like this:

```json example.json
[
  {
    "datetime": 1672531212123456,
    "event_id": 1,
    "key1": "value1",
    "key2": "value2",
    "key3": "value3"
  },
  {
    "datetime": 1672531213789012,
    "event_id": 2,
    "key1": "value1",
    "key2": "value2",
    "key3": "value3"
  },
  {
    "datetime": 1672531214345678,
    "event_id": 3,
    "key1": "value1",
    "key2": "value2",
    "key3": "value3"
  }
]
```

I will detail the required configurations in this post, so that Splunk is able to parse it correctly even though "example.json" is not a valid JSON file.

## UF inputs.conf

```plain $SPLUNK_HOME/etc/deployment-apps/foo/local/inputs.conf
[monitor:///var/log/app_a]
disabled = 0
index = index_name
sourcetype = app_a_event
```

[**monitor**](https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf#MONITOR:) directive is made up of two parts: `monitor://` and the path, e.g. `/var/log/app_a`. Unlike most Splunk configs, this directive does't require the backslash (used in Windows path) to be escaped, e.g. `monitor://C:\foo\bar`.

A path can be a file or a folder. When (\*) wildcard matching is used to match multiple folders, another wildcard needs to be specified again to match files in those matched folders. The wildcard works for a single path segment only. For example, to match all the following files, use `monitor:///var/log/app_*/*`. Splunk also supports "..." for recursive matching.

```
/var/log/
├── app_a
│   ├── 1.log
│   ├── 2.log
│   └── 3.log
├── app_b
│   ├── 1.log
│   ├── 2.log
│   └── 3.log
└── app_c
    ├── 1.log
    ├── 2.log
    └── 3.log
```

Specify an appropriate value in **sourcetype** config, the value will be the value of `sourcetype` field in the ingested events under the "monitor" directive. Take note of the value you have configured, it will be used in the rest of configurations.

## Forwarder props.conf

```plain props.conf
[app_a_event]
description = App A logs
INDEXED_EXTRACTIONS = JSON
# separate each object into a line
LINE_BREAKER = }(,){\"datetime\"
# a line represents an event
SHOULD_LINEMERGE = 0
TIMESTAMP_FIELDS = datetime
TIME_FORMAT = %s
## default is 2000
# MAX_DAYS_AGO = 3560
```

The directive name should be the **sourcetype** value specified in the [inputs.conf](#uf-inputsconf). The following configs apply to the universal forwarder is because [`INDEXED_EXTRACTIONS`](https://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileswithstructureddata#Field_extraction_settings_for_forwarded_structured_data_must_be_configured_on_the_forwarder) is used.

- LINE_BREAKER: Search for string that matches the regex and replace only the capturing group with newline (\n). This is to separate each event into separate line.
  - `}(,){\"datetime\"` searches for `},{"datetime"` and replaces "," with "\n".
- SHOULD_LINEMERGE: only used for event that spans multiple lines. In this case, it's the reverse, the log file has all events in one line.
- TIMESTAMP_FIELDS: Refers to `datetime` key in the `example.json`.
- MAX_DAYS_AGO (optional): Specify the value if there are events older than 2,000 days.
- TIME_FORMAT: Optional if Unix time is used, but recommended to specify whenever possible. When Unix time is used, it is not necessary to specify `%s%3N` when there is [subsecond](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables).

The location of "props.conf" depends on whether the universal forwarder is centrally managed by a deployment server.

Path A: $SPLUNK_HOME/etc/deployment-apps/foo/local/props.conf
Path B: $SPLUNK_HOME/etc/apps/foo/local/props.conf

If there is a deployment server, then the config file should be in path A, in which the server will automatically deploy it to path B in the UF. If the UF is not centrally managed, it should head straight to path B.

## Search head props.conf

```plain props.conf
[app_a_event]
description = App A logs
KV_MODE = none
AUTO_KV_JSON = 0
SHOULD_LINEMERGE = 0
```

Since index-time field extraction is already enabled using `INDEXED_EXTRACTIONS`, search-time field extraction is no longer necessary. If `KV_MODE` and `AUTO_KV_JSON` are not disabled, there will be duplicate fields in the search result.

In Splunk Enterprise, the above file can be saved in a custom app, e.g. "$SPLUNK_HOME/etc/app/custom-app/default/props.conf"

For Splunk Cloud deployment, the above configuration can be added through a custom app or Splunk Web: **Settings > [Source types](https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Managesourcetypes)**.

## Ingesting API response

It is important to note `SEDCMD` [runs](https://www.aplura.com/assets/pdf/props_conf_order.pdf) [after](https://wiki.splunk.com/Community:HowIndexingWorks) `INDEXED_EXTRACTIONS`. I noticed [this behaviour](https://community.splunk.com/t5/Getting-Data-In/SEDCMD-not-actually-replacing-data-during-indexing/m-p/387812/highlight/true#M69511) when I tried to ingest API response of [LibreNMS](https://gitlab.com/curben/splunk-scripts/-/tree/main/TA-librenms-data-poller?ref_type=heads).

```json
{
  "status": "ok",
  "devices": [
    { "device_id": 1, "key1": "value1", "key2": "value2" },
    { "device_id": 2, "key1": "value1", "key2": "value2" },
    { "device_id": 3, "key1": "value1", "key2": "value2" }
  ],
  "count": 3
}
```

In this scenario, I only wanted to ingest "devices" array where each item is an event. The previous approach not only did not split the array, but "status" and "count" fields still existed in each event despite the use of SEDCMD to remove them.

The solution is not to use INDEXED_EXTRACTIONS (index-time field extraction), but use KV_MODE (search-time field extraction) instead. INDEXED_EXTRACTIONS is not enabled so that SEDCMD works more reliably. If it's enabled, the JSON parser can unpredictably split part of the prefix (in this case `{"status": "ok", "devices": [`) or suffix into separate events and SEDCMD does not work across events. SEDCMD does work with INDEXED_EXTRACTIONS, but you have to make sure the replacement is within an event

```plain props.conf
# heavy forwarder or indexer
[api_a_response]
description = API A response
# remove bracket at the start and end of each line
SEDCMD-remove_prefix = s/^\{"status": "ok", "devices": \[//g
SEDCMD-remove_suffix = s/\], "count": [0-9]+\}$//g
# separate each object into a line
LINE_BREAKER = }(, ){\"device_id\"
# if each line/event is very long
# TRUNCATE = 0
# a line represents an event
SHOULD_LINEMERGE = 0

```

```plain props.conf
# search head
[api_a_response]
description = API A response
KV_MODE = json
AUTO_KV_JSON = 1
```
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			`---`
			`title: Configure Splunk Universal Forwarder to ingest JSON files`
			`excerpt: Parse single-line JSON into separate events`
			`date: 2023-06-17`
post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00			`updated: 2024-01-05`
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			`tags:`
			`- splunk`
			`---`

			`The recommended logging format according to [Splunk best practice](https://dev.splunk.com/enterprise/docs/developapps/addsupport/logging/loggingbestpractices/#Use-developer-friendly-formats) looks like this:`

			```json example.log
			`{ "datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3" }`
			`{ "datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3" }`
			`{ "datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3" }`
			```

			`- Each event is in JSON, not the file.`
			`- This also means the log file is not a valid JSON file.`
			`- Each event is separated by newline.`

			`The format can be achieved by exporting live event in JSON and append to a log file. However, I encountered a situation where the log file can only be generated by batch. Exporting the equivalent of the previous "example.log" in JSON without string manipulation looks like this:`

			```json example.json
fix(highlight.js): conf lang/alias does not exist 2024-10-12 22:32:46 +00:00			`[`
			`{`
			`"datetime": 1672531212123456,`
			`"event_id": 1,`
			`"key1": "value1",`
			`"key2": "value2",`
			`"key3": "value3"`
			`},`
			`{`
			`"datetime": 1672531213789012,`
			`"event_id": 2,`
			`"key1": "value1",`
			`"key2": "value2",`
			`"key3": "value3"`
			`},`
			`{`
			`"datetime": 1672531214345678,`
			`"event_id": 3,`
			`"key1": "value1",`
			`"key2": "value2",`
			`"key3": "value3"`
			`}`
			`]`
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			```

post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			`I will detail the required configurations in this post, so that Splunk is able to parse it correctly even though "example.json" is not a valid JSON file.`
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00
post(json-splunk-uf): props.conf can be deployed through a custom app 2023-10-02 02:47:53 +00:00			`## UF inputs.conf`
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00
fix(highlight.js): conf lang/alias does not exist 2024-10-12 22:32:46 +00:00			```plain $SPLUNK_HOME/etc/deployment-apps/foo/local/inputs.conf
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			`[monitor:///var/log/app_a]`
			`disabled = 0`
			`index = index_name`
			`sourcetype = app_a_event`
			```

			[monitor](https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf#MONITOR:) directive is made up of two parts: `monitor://` and the path, e.g. `/var/log/app_a`. Unlike most Splunk configs, this directive does't require the backslash (used in Windows path) to be escaped, e.g. `monitor://C:\foo\bar`.

			A path can be a file or a folder. When (\) wildcard matching is used to match multiple folders, another wildcard needs to be specified again to match files in those matched folders. The wildcard works for a single path segment only. For example, to match all the following files, use `monitor:///var/log/app_/*`. Splunk also supports "..." for recursive matching.

			```
			`/var/log/`
			`├── app_a`
			`│ ├── 1.log`
			`│ ├── 2.log`
			`│ └── 3.log`
			`├── app_b`
			`│ ├── 1.log`
			`│ ├── 2.log`
			`│ └── 3.log`
			`└── app_c`
			`├── 1.log`
			`├── 2.log`
			`└── 3.log`
			```

			Specify an appropriate value in sourcetype config, the value will be the value of `sourcetype` field in the ingested events under the "monitor" directive. Take note of the value you have configured, it will be used in the rest of configurations.

post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			`## Forwarder props.conf`
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00
fix(highlight.js): conf lang/alias does not exist 2024-10-12 22:32:46 +00:00			```plain props.conf
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			`[app_a_event]`
			`description = App A logs`
			`INDEXED_EXTRACTIONS = JSON`
			`# separate each object into a line`
			`LINE_BREAKER = }(,){\"datetime\"`
			`# a line represents an event`
			`SHOULD_LINEMERGE = 0`
			`TIMESTAMP_FIELDS = datetime`
post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00			`TIME_FORMAT = %s`
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			`## default is 2000`
			`# MAX_DAYS_AGO = 3560`
			```

style(slugize): follow gfm & vscode lowercase & remove dot 2024-06-08 04:45:34 +00:00			The directive name should be the sourcetype value specified in the [inputs.conf](#uf-inputsconf). The following configs apply to the universal forwarder is because [`INDEXED_EXTRACTIONS`](https://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileswithstructureddata#Field_extraction_settings_for_forwarded_structured_data_must_be_configured_on_the_forwarder) is used.
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00
			`- LINE_BREAKER: Search for string that matches the regex and replace only the capturing group with newline (\n). This is to separate each event into separate line.`
			- `}(,){\"datetime\"` searches for `},{"datetime"` and replaces "," with "\n".
			`- SHOULD_LINEMERGE: only used for event that spans multiple lines. In this case, it's the reverse, the log file has all events in one line.`
			- TIMESTAMP_FIELDS: Refers to `datetime` key in the `example.json`.
			`- MAX_DAYS_AGO (optional): Specify the value if there are events older than 2,000 days.`
post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00			- TIME_FORMAT: Optional if Unix time is used, but recommended to specify whenever possible. When Unix time is used, it is not necessary to specify `%s%3N` when there is [subsecond](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables).
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			`The location of "props.conf" depends on whether the universal forwarder is centrally managed by a deployment server.`

			`Path A: $SPLUNK_HOME/etc/deployment-apps/foo/local/props.conf`
			`Path B: $SPLUNK_HOME/etc/apps/foo/local/props.conf`

			`If there is a deployment server, then the config file should be in path A, in which the server will automatically deploy it to path B in the UF. If the UF is not centrally managed, it should head straight to path B.`
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			`## Search head props.conf`

fix(highlight.js): conf lang/alias does not exist 2024-10-12 22:32:46 +00:00			```plain props.conf
post: Configure Splunk Universal Forwarder to ingest JSON files 2023-06-17 11:09:52 +00:00			`[app_a_event]`
			`description = App A logs`
			`KV_MODE = none`
			`AUTO_KV_JSON = 0`
			`SHOULD_LINEMERGE = 0`
			```
post(json-splunk-uf): source type can be configured through Splunk Web especially for Splunk Cloud that provides no access to props.conf 2023-08-13 07:54:21 +00:00
post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00			Since index-time field extraction is already enabled using `INDEXED_EXTRACTIONS`, search-time field extraction is no longer necessary. If `KV_MODE` and `AUTO_KV_JSON` are not disabled, there will be duplicate fields in the search result.

post(json-splunk-uf): props.conf can be deployed through a custom app 2023-10-02 02:47:53 +00:00			`In Splunk Enterprise, the above file can be saved in a custom app, e.g. "$SPLUNK_HOME/etc/app/custom-app/default/props.conf"`

			`For Splunk Cloud deployment, the above configuration can be added through a custom app or Splunk Web: Settings > [Source types](https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Managesourcetypes).`
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00
			`## Ingesting API response`

post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00			It is important to note `SEDCMD` [runs](https://www.aplura.com/assets/pdf/props_conf_order.pdf) [after](https://wiki.splunk.com/Community:HowIndexingWorks) `INDEXED_EXTRACTIONS`. I noticed [this behaviour](https://community.splunk.com/t5/Getting-Data-In/SEDCMD-not-actually-replacing-data-during-indexing/m-p/387812/highlight/true#M69511) when I tried to ingest API response of [LibreNMS](https://gitlab.com/curben/splunk-scripts/-/tree/main/TA-librenms-data-poller?ref_type=heads).
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00
			```json
fix(highlight.js): conf lang/alias does not exist 2024-10-12 22:32:46 +00:00			`{`
			`"status": "ok",`
			`"devices": [`
			`{ "device_id": 1, "key1": "value1", "key2": "value2" },`
			`{ "device_id": 2, "key1": "value1", "key2": "value2" },`
			`{ "device_id": 3, "key1": "value1", "key2": "value2" }`
			`],`
			`"count": 3`
			`}`
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			```

post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00			`In this scenario, I only wanted to ingest "devices" array where each item is an event. The previous approach not only did not split the array, but "status" and "count" fields still existed in each event despite the use of SEDCMD to remove them.`
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00
fix(highlight.js): conf lang/alias does not exist 2024-10-12 22:32:46 +00:00			The solution is not to use INDEXED_EXTRACTIONS (index-time field extraction), but use KV_MODE (search-time field extraction) instead. INDEXED_EXTRACTIONS is not enabled so that SEDCMD works more reliably. If it's enabled, the JSON parser can unpredictably split part of the prefix (in this case `{"status": "ok", "devices": [`) or suffix into separate events and SEDCMD does not work across events. SEDCMD does work with INDEXED_EXTRACTIONS, but you have to make sure the replacement is within an event
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00
fix(highlight.js): conf lang/alias does not exist 2024-10-12 22:32:46 +00:00			```plain props.conf
post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00			`# heavy forwarder or indexer`
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			`[api_a_response]`
			`description = API A response`
			`# remove bracket at the start and end of each line`
			`SEDCMD-remove_prefix = s/^\{"status": "ok", "devices": \[//g`
			`SEDCMD-remove_suffix = s/\], "count": [0-9]+\}$//g`
			`# separate each object into a line`
			`LINE_BREAKER = }(, ){\"device_id\"`
post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00			`# if each line/event is very long`
			`# TRUNCATE = 0`
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			`# a line represents an event`
			`SHOULD_LINEMERGE = 0`
post(json-splunk-uf): clarify indexing pipeline order 2024-01-05 08:04:02 +00:00
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			```

fix(highlight.js): conf lang/alias does not exist 2024-10-12 22:32:46 +00:00			```plain props.conf
post(json-splunk-uf): ingest json-formatted api response 2023-12-05 11:26:12 +00:00			`# search head`
			`[api_a_response]`
			`description = API A response`
			`KV_MODE = json`
			`AUTO_KV_JSON = 1`
			```