Reorganise files
This commit is contained in:
parent
4d98b65acb
commit
5962e0ab28
|
@ -34,6 +34,7 @@ deploy:
|
|||
- cd build
|
||||
|
||||
# Give execute permission to scripts
|
||||
- cd src/
|
||||
- chmod 700 umbrella-top-1m.sh script.sh commit.sh
|
||||
|
||||
# Download Umbrella Popularity List
|
||||
|
@ -46,6 +47,7 @@ deploy:
|
|||
- ./commit.sh
|
||||
|
||||
# Push the commit
|
||||
- cd ../
|
||||
- git push
|
||||
|
||||
only:
|
||||
|
|
|
@ -15,9 +15,9 @@ https://gitlab.com/curben/urlhaus/raw/master/urlhaus-filter.txt
|
|||
Following URL categories are removed from the database dump:
|
||||
|
||||
- Offline URL
|
||||
- Well-known host ([top-1m.txt](top-1m.txt)) or false positives ([exclude.txt](exclude.txt))
|
||||
- Well-known host ([top-1m.txt](src/top-1m.txt)) or false positives ([exclude.txt](src/exclude.txt))
|
||||
|
||||
Database dump is saved as [URLhaus.csv](URLhaus.csv), processed by [script.sh](script.sh) and output as [urlhaus-filter.txt](urlhaus-filter.txt).
|
||||
Database dump is saved as [src/URLhaus.csv](URLhaus.csv), processed by [script.sh](utils/script.sh) and output as [urlhaus-filter.txt](urlhaus-filter.txt).
|
||||
|
||||
## Note
|
||||
|
||||
|
|
|
@ -1 +0,0 @@
|
|||
o.aolcdn.com
|
Can't render this file because it is too large.
|
|
@ -0,0 +1 @@
|
|||
# Nothing yet...
|
|
@ -11,7 +11,7 @@ FIFTH_LINE="! Source: https://urlhaus.abuse.ch/api/"
|
|||
COMMENT="$FIRST_LINE\n$SECOND_LINE\n$THIRD_LINE\n$FOURTH_LINE\n$FIFTH_LINE"
|
||||
|
||||
# Download the database dump
|
||||
wget https://urlhaus.abuse.ch/downloads/csv/ -O URLhaus.csv
|
||||
wget https://urlhaus.abuse.ch/downloads/csv/ -O ../src/URLhaus.csv
|
||||
|
||||
# Parse domains and IP address only
|
||||
cat URLhaus.csv | \
|
||||
|
@ -22,8 +22,8 @@ cut -f 1 -d ':' | \
|
|||
# Sort and remove duplicates
|
||||
sort -u | \
|
||||
# Exclude Umbrella Top 1M
|
||||
grep -vf top-1m.txt | \
|
||||
grep -vf ../src/top-1m.txt | \
|
||||
# Exclude false positive
|
||||
grep -vf exclude.txt | \
|
||||
grep -vf ../src/exclude.txt | \
|
||||
# Append header comment to the filter list
|
||||
sed '1 i\'"$COMMENT"'' > urlhaus-filter.txt
|
||||
sed '1 i\'"$COMMENT"'' > ../urlhaus-filter.txt
|
|
@ -9,4 +9,4 @@ wget -O- http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip | \
|
|||
# Unzip
|
||||
funzip | \
|
||||
# Parse domains only
|
||||
cut -f 2 -d ',' > top-1m.txt
|
||||
cut -f 2 -d ',' > ../src/top-1m.txt
|
Loading…
Reference in New Issue