Add umbrella top 1m
https://umbrella.cisco.com/blog/2016/12/14/cisco-umbrella-1-million/
This commit is contained in:
parent
3dcc053d0c
commit
4d98b65acb
|
@ -34,7 +34,10 @@ deploy:
|
|||
- cd build
|
||||
|
||||
# Give execute permission to scripts
|
||||
- chmod 700 script.sh commit.sh
|
||||
- chmod 700 umbrella-top-1m.sh script.sh commit.sh
|
||||
|
||||
# Download Umbrella Popularity List
|
||||
- ./umbrella-top-1m.sh
|
||||
|
||||
# Download database dump and process it
|
||||
- ./script.sh
|
||||
|
|
11
README.md
11
README.md
|
@ -15,13 +15,13 @@ https://gitlab.com/curben/urlhaus/raw/master/urlhaus-filter.txt
|
|||
Following URL categories are removed from the database dump:
|
||||
|
||||
- Offline URL
|
||||
- Well-known host or false positives (see [exclude.txt](exclude.txt))
|
||||
- Well-known host ([top-1m.txt](top-1m.txt)) or false positives ([exclude.txt](exclude.txt))
|
||||
|
||||
Database dump is saved as [URLhaus.csv](URLhaus.csv), processed by [script.sh](script.sh) and output as [urlhaus-filter.txt](urlhaus-filter.txt).
|
||||
|
||||
## Note
|
||||
|
||||
Please report any false positive, especially if the domain is one of the Alexa 10M.
|
||||
Please report any false positive.
|
||||
|
||||
This filter **only** accepts malware URLs from [URLhaus](https://urlhaus.abuse.ch/).
|
||||
|
||||
|
@ -34,7 +34,10 @@ This repo is not endorsed by Abuse.sh.
|
|||
- Can you add this *very-bad-url.com* to the filter?
|
||||
+ No, please report to the [upstream](https://urlhaus.abuse.ch/api/#submit).
|
||||
|
||||
- Why do you need to clone the repo again in your CI?
|
||||
- Why don't you use the URLhaus "Plain-Text URL List"?
|
||||
+ It doesn't show the status (online/offline) of a URL.
|
||||
|
||||
- Why do you need to clone the repo again in your CI? I thought CI already fetch the repo by default?
|
||||
+ GitLab Runner clone/fetch the repo using HTTPS method by default ([log](https://gitlab.com/curben/urlhaus/-/jobs/105979394)). This method requires deploy *token* which is *read-only* (cannot push).
|
||||
+ Deploy *key* has write access but cannot be used with the HTTPS method, hence, the workaround to clone using SSH.
|
||||
+ See issue [#20567](https://gitlab.com/gitlab-org/gitlab-ce/issues/20567) and [#20845](https://gitlab.com/gitlab-org/gitlab-ce/issues/20845).
|
||||
+ See issue [#20567](https://gitlab.com/gitlab-org/gitlab-ce/issues/20567) and [#20845](https://gitlab.com/gitlab-org/gitlab-ce/issues/20845).
|
||||
|
|
|
@ -13,7 +13,7 @@ COMMENT="$FIRST_LINE\n$SECOND_LINE\n$THIRD_LINE\n$FOURTH_LINE\n$FIFTH_LINE"
|
|||
# Download the database dump
|
||||
wget https://urlhaus.abuse.ch/downloads/csv/ -O URLhaus.csv
|
||||
|
||||
# Parse domain name and IP address only
|
||||
# Parse domains and IP address only
|
||||
cat URLhaus.csv | \
|
||||
grep '"online"' | \
|
||||
cut -f 6 -d '"' | \
|
||||
|
@ -21,6 +21,8 @@ cut -f 3 -d '/' | \
|
|||
cut -f 1 -d ':' | \
|
||||
# Sort and remove duplicates
|
||||
sort -u | \
|
||||
# Exclude Umbrella Top 1M
|
||||
grep -vf top-1m.txt | \
|
||||
# Exclude false positive
|
||||
grep -vf exclude.txt | \
|
||||
# Append header comment to the filter list
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,12 @@
|
|||
#!/bin/sh
|
||||
|
||||
# Download the Cisco Umbrella 1 Million
|
||||
# More info:
|
||||
# https://s3-us-west-1.amazonaws.com/umbrella-static/index.html
|
||||
|
||||
# Download the list
|
||||
wget -O- http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip | \
|
||||
# Unzip
|
||||
funzip | \
|
||||
# Parse domains only
|
||||
cut -f 2 -d ',' > top-1m.txt
|
Loading…
Reference in New Issue