Commit Graph

26 Commits

Author SHA1 Message Date
curben f9e1cb84ce fix: run dos2unix before text processing
rename urlhaus.txt in tmp/
2019-05-28 09:59:02 +09:30
curben 3c1384b95b chore(commit): add 'set -e -x' 2019-05-28 09:53:29 +09:30
curben 9a5fdb2be6 fix: use simple URL list
we no longer care the status of URL
bb817d9838
2019-05-27 15:59:08 +09:30
curben bb817d9838 fix: use all URLs including offline's
upstream (urlhaus.abuse.ch) incorrectly marks many online urls as offline.
noticed from 6c7faa95f7
2019-05-27 15:10:12 +09:30
curben a7046c77a6 refactor: move script executions from CI config to index.sh
easier to test locally
2019-05-27 15:01:57 +09:30
curben ea6c3f6796 refactor: remove '-e' parameter of sed
not necessary if there is only one script
https://unix.stackexchange.com/a/33159
2019-05-17 18:13:26 +09:30
curben 43fdf9893f docs: clarify malware-url-top-domains.sh 2019-05-12 12:48:13 +09:30
curben 6e1a6b4c58 style: fix typo in comment 2019-05-12 12:48:13 +09:30
curben 013267e310 perf: grep using urlhaus-top-domains.txt instead of much larger top-1m.txt 2019-05-12 12:48:13 +09:30
curben f700065788 chore(ci): move script.sh to CI config
to stop the build as soon as current script fails,
especially if wget fails in prerequisites.sh
2019-05-11 20:27:14 +09:30
curben 5beecca906 feat: include full URL for popular domains 2019-05-11 18:49:25 +09:30
curben 1547bb0e96 Remove top-1m.txt
The dataset is not under public domain and may subject to copyright claim by Umbrella/Cisco
2018-10-22 13:40:22 +10:30
curben 03ea9f517b Update repo link 2018-10-12 11:39:37 +10:30
curben 8c9b62b9c1 Fix header comment 2018-10-11 15:11:33 +10:30
curben 4566803d67 Add filter update frequency to 1 day
Filter is updated twice a day
2018-10-11 14:49:08 +10:30
curben 1449c6ec47 Use simpler sed syntax
for matching beginning of a line
2018-10-11 14:40:18 +10:30
curben 6c030d840e Use cross-platform sed syntax
https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed?page=1&tab=votes#comment9175314_1252191
2018-10-11 14:24:31 +10:30
curben 88d6447fe0 Use dos2unix instead of sed
Add sed workaround for matching new line
https://stackoverflow.com/a/1252191
2018-10-11 14:16:51 +10:30
curben e4dc980c96 Match whole line for faster search
Use unix line ending as standard
2018-10-11 13:50:48 +10:30
curben 64de7976cc Use unzip
alpine's gunzip/gzip/zcat doesn't support zip
2018-10-11 12:20:00 +10:30
curben fe3e3750a2 Use gunzip instead of zcat
alpine's zcat supports only gzip/bzip2/xz not zip
2018-10-11 11:43:01 +10:30
curben 073f2b0faf wget and save it into a file
instead of stdout
2018-10-11 11:23:53 +10:30
curben f453b3397d Fix path 2018-10-11 11:12:35 +10:30
curben via GitLab Runner f8b56a2816 Filter updated: Thu, 11 Oct 2018 00:26:20 UTC 2018-10-11 00:26:21 +00:00
curben f74cba26b7 Use included zcat 2018-10-10 18:18:36 +10:30
curben 5962e0ab28 Reorganise files 2018-10-10 17:14:49 +10:30