Reorganise files

This commit is contained in:
curben 2018-10-10 17:14:49 +10:30
parent 4d98b65acb
commit 5962e0ab28
9 changed files with 10 additions and 8 deletions

View File

@ -34,6 +34,7 @@ deploy:
- cd build
# Give execute permission to scripts
- cd src/
- chmod 700 umbrella-top-1m.sh script.sh commit.sh
# Download Umbrella Popularity List
@ -46,6 +47,7 @@ deploy:
- ./commit.sh
# Push the commit
- cd ../
- git push
only:

View File

@ -15,9 +15,9 @@ https://gitlab.com/curben/urlhaus/raw/master/urlhaus-filter.txt
Following URL categories are removed from the database dump:
- Offline URL
- Well-known host ([top-1m.txt](top-1m.txt)) or false positives ([exclude.txt](exclude.txt))
- Well-known host ([top-1m.txt](src/top-1m.txt)) or false positives ([exclude.txt](src/exclude.txt))
Database dump is saved as [URLhaus.csv](URLhaus.csv), processed by [script.sh](script.sh) and output as [urlhaus-filter.txt](urlhaus-filter.txt).
Database dump is saved as [src/URLhaus.csv](URLhaus.csv), processed by [script.sh](utils/script.sh) and output as [urlhaus-filter.txt](urlhaus-filter.txt).
## Note

View File

@ -1 +0,0 @@
o.aolcdn.com

View File

Can't render this file because it is too large.

1
src/exclude.txt Normal file
View File

@ -0,0 +1 @@
# Nothing yet...

View File

@ -11,7 +11,7 @@ FIFTH_LINE="! Source: https://urlhaus.abuse.ch/api/"
COMMENT="$FIRST_LINE\n$SECOND_LINE\n$THIRD_LINE\n$FOURTH_LINE\n$FIFTH_LINE"
# Download the database dump
wget https://urlhaus.abuse.ch/downloads/csv/ -O URLhaus.csv
wget https://urlhaus.abuse.ch/downloads/csv/ -O ../src/URLhaus.csv
# Parse domains and IP address only
cat URLhaus.csv | \
@ -22,8 +22,8 @@ cut -f 1 -d ':' | \
# Sort and remove duplicates
sort -u | \
# Exclude Umbrella Top 1M
grep -vf top-1m.txt | \
grep -vf ../src/top-1m.txt | \
# Exclude false positive
grep -vf exclude.txt | \
grep -vf ../src/exclude.txt | \
# Append header comment to the filter list
sed '1 i\'"$COMMENT"'' > urlhaus-filter.txt
sed '1 i\'"$COMMENT"'' > ../urlhaus-filter.txt

View File

@ -9,4 +9,4 @@ wget -O- http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip | \
# Unzip
funzip | \
# Parse domains only
cut -f 2 -d ',' > top-1m.txt
cut -f 2 -d ',' > ../src/top-1m.txt