mirror of https://gitlab.com/curben/blog
post: Configure Splunk Universal Forwarder to ingest JSON files
This commit is contained in:
parent
86899b752b
commit
d47131e6b7
|
@ -0,0 +1,102 @@
|
|||
---
|
||||
title: Configure Splunk Universal Forwarder to ingest JSON files
|
||||
excerpt: Parse single-line JSON into separate events
|
||||
date: 2023-06-17
|
||||
tags:
|
||||
- splunk
|
||||
---
|
||||
|
||||
The recommended logging format according to [Splunk best practice](https://dev.splunk.com/enterprise/docs/developapps/addsupport/logging/loggingbestpractices/#Use-developer-friendly-formats) looks like this:
|
||||
|
||||
```json example.log
|
||||
{ "datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3" }
|
||||
{ "datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3" }
|
||||
{ "datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3" }
|
||||
```
|
||||
|
||||
- Each **event** is in JSON, not the file.
|
||||
- This also means the log file is not a valid JSON file.
|
||||
- Each event is separated by newline.
|
||||
|
||||
The format can be achieved by exporting live event in JSON and append to a log file. However, I encountered a situation where the log file can only be generated by batch. Exporting the equivalent of the previous "example.log" in JSON without string manipulation looks like this:
|
||||
|
||||
```json example.json
|
||||
[{"datetime": 1672531212123456, "event_id": 1, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531213789012, "event_id": 2, "key1": "value1", "key2": "value2", "key3": "value3"}, {"datetime": 1672531214345678, "event_id": 3, "key1": "value1", "key2": "value2", "key3": "value3"}]
|
||||
```
|
||||
|
||||
I will detail the required configurations in this post, so that Splunk is able to parse it correctly even though "example.json" is a valid JSON file.
|
||||
|
||||
## App-specific inputs.conf
|
||||
|
||||
```conf $SPLUNK_HOME/etc/deployment-apps/foo/local/inputs.conf
|
||||
[monitor:///var/log/app_a]
|
||||
disabled = 0
|
||||
index = index_name
|
||||
sourcetype = app_a_event
|
||||
```
|
||||
|
||||
[**monitor**](https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf#MONITOR:) directive is made up of two parts: `monitor://` and the path, e.g. `/var/log/app_a`. Unlike most Splunk configs, this directive does't require the backslash (used in Windows path) to be escaped, e.g. `monitor://C:\foo\bar`.
|
||||
|
||||
A path can be a file or a folder. When (\*) wildcard matching is used to match multiple folders, another wildcard needs to be specified again to match files in those matched folders. The wildcard works for a single path segment only. For example, to match all the following files, use `monitor:///var/log/app_*/*`. Splunk also supports "..." for recursive matching.
|
||||
|
||||
```
|
||||
/var/log/
|
||||
├── app_a
|
||||
│ ├── 1.log
|
||||
│ ├── 2.log
|
||||
│ └── 3.log
|
||||
├── app_b
|
||||
│ ├── 1.log
|
||||
│ ├── 2.log
|
||||
│ └── 3.log
|
||||
└── app_c
|
||||
├── 1.log
|
||||
├── 2.log
|
||||
└── 3.log
|
||||
```
|
||||
|
||||
Specify an appropriate value in **sourcetype** config, the value will be the value of `sourcetype` field in the ingested events under the "monitor" directive. Take note of the value you have configured, it will be used in the rest of configurations.
|
||||
|
||||
## App-specific props.conf
|
||||
|
||||
```conf $SPLUNK_HOME/etc/deployment-apps/foo/local/props.conf
|
||||
[app_a_event]
|
||||
description = App A logs
|
||||
disabled = 0
|
||||
INDEXED_EXTRACTIONS = JSON
|
||||
# remove bracket at the start and end of each line
|
||||
SEDCMD-a = s/^\[//g
|
||||
SEDCMD-b = s/\]$//g
|
||||
# separate each object into a line
|
||||
LINE_BREAKER = }(,){\"datetime\"
|
||||
# a line represents an event
|
||||
SHOULD_LINEMERGE = 0
|
||||
TIMESTAMP_FIELDS = datetime
|
||||
## default is 2000
|
||||
# MAX_DAYS_AGO = 3560
|
||||
# TIME_FORMAT = %s
|
||||
```
|
||||
|
||||
The directive name should be the **sourcetype** value specified in the [inputs.conf](#App-specific-inputs-conf).
|
||||
|
||||
- SEDCMD: [sed](https://tldr.inbrowser.app/pages/common/sed) script, `SEDCMD-<name>` can be specified multiple times to run different scripts, each with different name.
|
||||
- `s/^\[//g` removes "[" at the start of each line.
|
||||
- `s/\]$//g` removes "]" at the end of each line.
|
||||
- LINE_BREAKER: Search for string that matches the regex and replace only the capturing group with newline (\n). This is to separate each event into separate line.
|
||||
- `}(,){\"datetime\"` searches for `},{"datetime"` and replaces "," with "\n".
|
||||
- SHOULD_LINEMERGE: only used for event that spans multiple lines. In this case, it's the reverse, the log file has all events in one line.
|
||||
- TIMESTAMP_FIELDS: Refers to `datetime` key in the `example.json`.
|
||||
- MAX_DAYS_AGO (optional): Specify the value if there are events older than 2,000 days.
|
||||
- TIME_FORMAT: Optional if Unix time is used. When Unix time is used, it is not necessary to specify `%s%3N` when there is [subsecond](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables).
|
||||
|
||||
## System props.conf
|
||||
|
||||
```conf $SPLUNK_HOME/etc/system/local/props.conf
|
||||
[app_a_event]
|
||||
description = App A logs
|
||||
disabled = 0
|
||||
KV_MODE = none
|
||||
AUTO_KV_JSON = 0
|
||||
SHOULD_LINEMERGE = 0
|
||||
# MAX_DAYS_AGO = 3560
|
||||
```
|
Loading…
Reference in New Issue