post: Malicious website detection on Splunk using malware-filter

This commit is contained in:
Ming Di Leom 2023-04-16 05:58:11 +00:00
parent bb1a561060
commit 483c071f01
No known key found for this signature in database
GPG Key ID: 32D3E28E96A695E8
1 changed files with 480 additions and 0 deletions

View File

@ -0,0 +1,480 @@
---
title: Malicious website detection on Splunk using malware-filter
excerpt: A guide on using malware-filter lookups
date: 2023-04-16
tags:
- splunk
---
[Splunk Add-on for malware-filter](https://gitlab.com/malware-filter/splunk-malware-filter) includes the following CSV files:
- botnet-filter-splunk.csv
- botnet_ip.csv
- opendbl_ip.csv
- phishing-filter-splunk.csv
- pup-filter-splunk.csv
- urlhaus-filter-splunk-online.csv
- vn-badsite-filter-splunk.csv
These CSV files can be used as [lookups](https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutlookupsandfieldactions) to find potentially malicious traffic. They contain a list of bad IPs/domains/URLs and we are going to look for those values in the [events](https://docs.splunk.com/Splexicon:Event).
We can view the content of a lookup file by using [`inputlookup`](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Inputlookup). When using that command, there should always be a leading pipe character "|" because it is an [event-generating](https://docs.splunk.com/Splexicon:Generatingcommand) command.
## Lookup file locations
Lookup file can be uploaded via [Splunk Web](https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usefieldlookupstoaddinformationtoyourevents#Upload_the_lookup_table_file) or creating the file in the following locations:
- `$SPLUNK_HOME/etc/users/<username>/<app_name>/lookups/`
- `$SPLUNK_HOME/etc/apps/<app_name>/lookups/`
- `$SPLUNK_HOME/etc/system/lookups/`
In Splunk Web, setting the permission to app-sharing or global-sharing will automatically moves the file to the second or third location respectively. Uploaded lookup file can be used straight away without having to reload app or restart Splunk, regardless of which way it was created.
## inputlookup basics
```spl
| inputlookup botnet_ip.csv
```
> `_time` field is omitted for brevity.
| first_seen_utc | dst_ip | dst_port | c2_status | last_online | malware | updated |
| ------------------- | ------- | -------- | --------- | ----------- | ------- | -------------------- |
| 2021-05-16 19:49:33 | 1.2.3.4 | 1234 | online | 2023-03-05 | Lorem | 2023-03-04T16:41:17Z |
The output is no different to any other event, we can specify which fields to be displayed and then rename the fields.
```spl
| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst
```
| dst |
| ------------ |
| 178.128.23.9 |
## Search for specific events
Example firewall events:
```spl
index=firewall
```
| src | src_port | dst | action |
| ----------- | -------- | ------- | ------- |
| 192.168.1.5 | 45454 | 1.2.3.4 | allowed |
| 192.168.1.3 | 45452 | 7.6.5.4 | allowed |
| 192.168.1.4 | 45457 | 4.3.2.1 | allowed |
| 192.168.1.6 | 45451 | 7.7.5.5 | allowed |
Notice the second row's `dst` value matches `dst_port` value of the example lookup table shown in the [previous section](#inputlookup-basics).
To match for `dst` value of the firewall events and `dst_ip` of the lookup file, use a [subsearch](https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchTutorial/Useasubsearch) with `inputlookup`. In this example, the subsearch extracts only the `dst_ip` field and rename it to `dst` in order to match the same field in the firewall events.
```spl
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
```
| src | src_port | dst | action |
| ----------- | -------- | ------- | ------- |
| 192.168.1.5 | 45454 | 1.2.3.4 | allowed |
To display events in table format, append `| table *`
## Wildcard
Asterisk character (`*`) in the lookup file does work as a [wildcard](https://docs.splunk.com/Documentation/SCS/current/Search/Wildcards).
```spl
index=proxy
```
| src | url | dst_port |
| ----------- | ------------- | -------- |
| 192.168.1.5 | foo.com/path1 | 443 |
| 192.168.1.3 | foo.com/path2 | 443 |
| 192.168.1.4 | bar.com/path3 | 443 |
The lookup files do not include wildcard affix.
```spl
| inputlookup urlhaus-filter-splunk-online.csv
```
| host | path | message | updated |
| ------- | ---- | ----------------------------------------- | -------------------- |
| foo.com | | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
The add-on includes [`geturlhausfilter`](https://gitlab.com/malware-filter/splunk-malware-filter#geturlhausfilter) command along with other commands to update their respective lookup file. Those commands has `wildcard_suffix` argument to append wildcard to the field's values.
```
| geturlhausfilter wildcard_suffix=host
| outputlookup override_if_empty=false urlhaus-filter-splunk-online.csv
```
| host | path | message | updated | host_wildcard_suffix |
| ------- | ---- | ----------------------------------------- | -------------------- | -------------------- |
| foo.com | | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z | foo.com\* |
```spl
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host_wildcard_suffix | rename host_wildcard_suffix AS url ]
```
| src | url | dst_port |
| ----------- | ------------- | -------- |
| 192.168.1.5 | foo.com/path1 | 443 |
| 192.168.1.3 | foo.com/path2 | 443 |
### Wildcard prefix
Previous section showed an example using wildcard suffix ("foo.com\*"). Wildcard also works as a prefix ("\*foo.com") or even in the middle ("f\*o.com"), though these are [discouraged](https://docs.splunk.com/Documentation/SCS/current/Search/Wildcards#When_to_avoid_wildcard_characters).
```spl
index=proxy
```
| src | domain | dst_port |
| ----------- | ------------- | -------- |
| 192.168.1.5 | foo.com | 443 |
| 192.168.1.3 | lorem.foo.com | 443 |
| 192.168.1.4 | bar.com | 443 |
```spl
| geturlhausfilter wildcard_prefix=host
| outputlookup override_if_empty=false urlhaus-filter-splunk-online.csv
```
| host | path | message | updated | host_wildcard_prefix |
| ------- | ---- | ----------------------------------------- | -------------------- | -------------------- |
| foo.com | | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z | \*foo.com |
```spl
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host_wildcard_prefix | rename host_wildcard_prefix AS domain ]
```
| src | domain | dst_port |
| ----------- | ------------- | -------- |
| 192.168.1.5 | foo.com | 443 |
| 192.168.1.3 | lorem.foo.com | 443 |
## Matching multiple fields
File hosting services like Google Docs and Dropbox are commonly abused to host phishing website. For those sites, the lookup should match both domain and path. When specifying more than one field in `fields` command, all fields will be matched using AND condition.
```spl
index=proxy
```
| src | domain | path |
| ----------- | ------- | -------------- |
| 192.168.1.5 | foo.com | document1.html |
| 192.168.1.3 | foo.com | document2.html |
| 192.168.1.4 | foo.com | document3.html |
```spl
| inputlookup urlhaus-filter-splunk-online.csv
```
| host | path | message | updated |
| ------- | -------------- | ----------------------------------------- | -------------------- |
| foo.com | document1.html | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
```spl
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
```
| src | domain | path |
| ----------- | ------- | -------------- |
| 192.168.1.5 | foo.com | document1.html |
### Matching individual and multiple fields
A lookup file may have rows with empty `path` to denote a `domain` should be blocked regardless of paths, while also having rows with both `domain` and `path` to denote a specific URL should be blocked instead. The syntax is the same as what was shown in the [previous section](#Matching-multiple-fields) because Splunk will only match **non-empty** values, empty values will be ignored instead.
```spl
index=proxy
```
| src | domain | path |
| ----------- | --------------- | ---------------- |
| 192.168.1.5 | bad-domain.com | lorem-ipsum.html |
| 192.168.1.3 | bad-domain.com | foo-bar.html |
| 192.168.1.4 | docs.google.com | malware.exe |
| 192.168.1.4 | docs.google.com | safe.doc |
```spl
| inputlookup urlhaus-filter-splunk-online.csv
```
| host | path | message | updated |
| --------------- | ----------- | ----------------------------------------- | -------------------- |
| bad-domain.com | | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
| docs.google.com | malware.exe | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
```spl
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
```
| src | domain | path |
| ----------- | --------------- | ---------------- |
| 192.168.1.5 | bad-domain.com | lorem-ipsum.html |
| 192.168.1.3 | bad-domain.com | foo-bar.html |
| 192.168.1.4 | docs.google.com | malware.exe |
## Case-insensitive
Lookup file is case-insensitive. If case-sensitive matching is required, use `lookup` and lookup definition.
```spl
index=proxy
```
| src | domain |
| ----------- | -------------- |
| 192.168.1.5 | loremipsum.com |
```spl
| inputlookup urlhaus-filter-splunk-online.csv
```
| host | path | message | updated |
| --------------- | ----------- | ----------------------------------------- | -------------------- |
| lOrEmIpSuM.com | | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
| docs.google.com | malware.exe | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
```spl
index=proxy [| inputlookup urlhaus-filter-splunk-online.csv | fields host, path | rename host AS domain ]
```
| src | domain |
| ----------- | -------------- |
| 192.168.1.5 | loremipsum.com |
## CIDR matching
Splunk automatically detects CIDR-like value in a lookup file and performs CIDR-matching accordingly. However, this behaviour is on best-effort basis and may not work as intended. To explicitly use lookup fields for CIDR-matching, use `lookup` and lookup definition.
```spl
index=firewall
```
| src | src_port | dst | action |
| ----------- | -------- | --------------- | ------- |
| 192.168.1.5 | 45454 | 187.190.252.167 | allowed |
| 192.168.1.3 | 45452 | 7.6.5.4 | allowed |
| 192.168.1.4 | 45457 | 4.3.2.1 | allowed |
| 192.168.1.6 | 45451 | 89.248.163.100 | allowed |
```spl
| inputlookup opendbl_ip.csv
```
| start | end | netmask | cidr_range | name | updated |
| --------------- | --------------- | ------- | ------------------ | ----------------------------------------- | -------------------- |
| 187.190.252.167 | 187.190.252.167 | 32 | 187.190.252.167/32 | Emerging Threats: Known Compromised Hosts | 2023-01-30T08:03:00Z |
| 89.248.163.0 | 89.248.163.255 | 24 | 89.248.163.0/24 | Dshield | 2023-01-30T08:01:00Z |
```spl
index=firewall [| inputlookup opendbl_ip.csv | fields cidr_range | rename cidr_range AS dst ]
```
| src | src_port | dst | action |
| ----------- | -------- | --------------- | ------- |
| 192.168.1.5 | 45454 | 187.190.252.167 | allowed |
| 192.168.1.6 | 45451 | 89.248.163.100 | allowed |
## inputlookup + lookup
When using as a subsearch, `inputlookup` filters the event data and only outputs rows with matching values of specified field(s). `lookup` enriches the event data by appending new fields to the rows with matching field values. Another way to understand the difference is that `inputlookup` performs [inner join](<https://en.wikipedia.org/wiki/Join_(SQL)#Inner_join>) while `lookup` performs [left outer join](<https://en.wikipedia.org/wiki/Join_(SQL)#Left_outer_join>) where the event data is the left table and the lookup file is the right table.
Despite their difference, it can be useful to use both at the same time to enrich filtered event data, even when using the same lookup file.
```spl
| inputlookup botnet_ip.csv
```
> `_time` field is omitted for brevity.
| first_seen_utc | dst_ip | dst_port | c2_status | last_online | malware | updated |
| ------------------- | ------- | -------- | --------- | ----------- | ------- | -------------------- |
| 2021-05-16 19:49:33 | 1.2.3.4 | 1234 | online | 2023-03-05 | Lorem | 2023-03-04T16:41:17Z |
| 2021-05-16 19:49:33 | 4.3.2.1 | 1234 | online | 2023-03-05 | Ipsum | 2023-03-04T16:41:17Z |
```spl
index=firewall
```
| src | src_port | dst | action |
| ----------- | -------- | ------- | ------- |
| 192.168.1.5 | 45454 | 1.2.3.4 | allowed |
| 192.168.1.3 | 45452 | 7.6.5.4 | allowed |
| 192.168.1.4 | 45457 | 4.3.2.1 | allowed |
| 192.168.1.6 | 45451 | 7.7.5.5 | allowed |
```spl
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
```
| src | src_port | dst | action |
| ----------- | -------- | ------- | ------- |
| 192.168.1.5 | 45454 | 1.2.3.4 | allowed |
| 192.168.1.3 | 45452 | 7.6.5.4 | allowed |
| 192.168.1.4 | 45457 | 4.3.2.1 | allowed |
| 192.168.1.6 | 45451 | 7.7.5.5 | allowed |
```spl
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
| lookup botnet_ip.csv dst_ip AS dst OUTPUT c2_status, malware
```
| src | src_port | dst | action | c2_status | malware |
| ----------- | -------- | ------- | ------- | --------- | ------- |
| 192.168.1.5 | 45454 | 1.2.3.4 | allowed | online | Lorem |
| 192.168.1.4 | 45457 | 4.3.2.1 | allowed | online | Ipsum |
It is also possible to rename lookup destination fields.
```spl
index=firewall [| inputlookup botnet_ip.csv | fields dst_ip | rename dst_ip AS dst]
| lookup botnet_ip.csv dst_ip AS dst OUTPUT c2_status AS "C2 Server Status", malware AS "Malware Family"
```
| src | src_port | dst | action | C2 Server Status | Malware Family |
| ----------- | -------- | ------- | ------- | ---------------- | -------------- |
| 192.168.1.5 | 45454 | 1.2.3.4 | allowed | online | Lorem |
| 192.168.1.4 | 45457 | 4.3.2.1 | allowed | online | Ipsum |
## Lookup definition
Lookup definition provides matching rules for a lookup file. It can be configured for case-sensitivity, wildcard, CIDR-matching and others through [transforms.conf](https://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf). It can also be configured via Splunk Web: Settings -> Lookups -> Lookup definitions.
A bare minimum lookup definition is as such:
```conf transforms.conf
[lookup-definition-name]
filename = lookup-filename.csv
```
transforms.conf can be saved in the following directories in [order of priority](https://docs.splunk.com/Documentation/Splunk/latest/Admin/Wheretofindtheconfigurationfiles) (highest to lowest):
- `$SPLUNK_HOME/etc/users/<username>/<app_name>/local/`
- `$SPLUNK_HOME/etc/apps/<app_name>/local/`
- `$SPLUNK_HOME/etc/system/local/`
My naming convention for lookup definition is simply removing the `.csv` extension, e.g. "example.csv" (lookup file), "example" (lookup definition). While it is possible to name a lookup definition with file extension ("example.csv"), I discourage it to avoid confusion.
It is imperative to note that lookup definition only applies to `lookup` search command and does _not_ apply to `inputlookup`. Although `inputlookup` supports lookup definition as a lookup table (in addition to lookup file), its matching rules will be ignored.
### Case-sensitive
```conf transforms.conf
[urlhaus-filter-splunk-online]
filename = urlhaus-filter-splunk-online.csv
# applies to all fields
case_sensitive_match = 1
```
```spl
index=proxy
```
| src | domain | path |
| ----------- | -------------- | ---------------- |
| 192.168.1.5 | bad-domain.com | lorem-ipsum.html |
| 192.168.1.3 | bad-domain.com | lOrEm-iPsUm.hTmL |
```spl
| inputlookup urlhaus-filter-splunk-online
```
| host | path | message | updated |
| -------------- | ---------------- | ----------------------------------------- | -------------------- |
| bad-domain.com | lorem-ipsum.html | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z |
```spl
index=proxy
| lookup urlhaus-filter-splunk-online host AS domain, path OUTPUT message
```
| src | domain | path | message |
| ----------- | -------------- | ---------------- | ----------------------------------------- |
| 192.168.1.5 | bad-domain.com | lorem-ipsum.html | urlhaus-filter malicious website detected |
| 192.168.1.3 | bad-domain.com | lOrEm-iPsUm.hTmL | |
### Wildcard (lookup)
```conf transforms.conf
[urlhaus-filter-splunk-online]
filename = urlhaus-filter-splunk-online.csv
match_type = WILDCARD(host_wildcard_suffix)
```
```spl
index=proxy
```
| src | url | dst_port |
| ----------- | ------------- | -------- |
| 192.168.1.5 | foo.com/path1 | 443 |
| 192.168.1.3 | foo.com/path2 | 443 |
| 192.168.1.4 | bar.com/path3 | 443 |
The lookup files do not include wildcard affix.
```spl
| inputlookup urlhaus-filter-splunk-online
```
| host | path | message | updated | host_wildcard_suffix |
| ------- | ---- | ----------------------------------------- | -------------------- | -------------------- |
| foo.com | | urlhaus-filter malicious website detected | 2023-03-13T00:11:20Z | foo.com\* |
```spl
index=proxy
| lookup urlhaus-filter-splunk-online host_wildcard_suffix AS url OUTPUT message
```
| src | url | dst_port | message |
| ----------- | ------------- | -------- | ----------------------------------------- |
| 192.168.1.5 | foo.com/path1 | 443 | urlhaus-filter malicious website detected |
| 192.168.1.3 | foo.com/path2 | 443 | urlhaus-filter malicious website detected |
### CIDR-matching (lookup)
```conf transforms.conf
[opendbl_ip]
filename = opendbl_ip.csv
match_type = CIDR(cidr_range)
```
```spl
index=firewall
```
| src | src_port | dst | action |
| ----------- | -------- | --------------- | ------- |
| 192.168.1.5 | 45454 | 187.190.252.167 | allowed |
| 192.168.1.3 | 45452 | 7.6.5.4 | allowed |
| 192.168.1.4 | 45457 | 4.3.2.1 | allowed |
| 192.168.1.6 | 45451 | 89.248.163.100 | allowed |
```spl
| inputlookup opendbl_ip
```
| start | end | netmask | cidr_range | name | updated |
| --------------- | --------------- | ------- | ------------------ | ----------------------------------------- | -------------------- |
| 187.190.252.167 | 187.190.252.167 | 32 | 187.190.252.167/32 | Emerging Threats: Known Compromised Hosts | 2023-01-30T08:03:00Z |
| 89.248.163.0 | 89.248.163.255 | 24 | 89.248.163.0/24 | Dshield | 2023-01-30T08:01:00Z |
```spl
index=firewall
| lookup opendbl_ip cidr_range AS dst OUTPUT name AS threat
```
| src | src_port | dst | action | threat |
| ----------- | -------- | --------------- | ------- | ----------------------------------------- |
| 192.168.1.5 | 45454 | 187.190.252.167 | allowed | Emerging Threats: Known Compromised Hosts |
| 192.168.1.3 | 45452 | 7.6.5.4 | allowed | |
| 192.168.1.4 | 45457 | 4.3.2.1 | allowed | |
| 192.168.1.6 | 45451 | 89.248.163.100 | allowed | Dshield |