Monday, September 30, 2019

How big can a suricata dataset be?

After poking around with datasets yesterday I was curious how many domain sha256 hashes we can load with the default dataset configuration.

The list came from https://www.domcop.com/top-10-million-domains and I just grabbed chunks with 'head -n' against the list of domains in the csv export (just the 2nd field in the csv, the actual domains).


I was able to load 155,802 hashes/domains!

[11148] 30/9/2019 -- 21:48:12 - (datasets.c:275) <Config> (DatasetLoadSha256) -- dataset: dns-seen loaded 155802 records
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:491) <Debug> (DatasetGet) -- set 0x55d95b6753e0/dns-seen type 3 save ./topdomains.155k.lst load ./topdomains.155k.lst
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:577) <Debug> (DatasetsInit) -- dataset dns-seen: id 0 type sha256
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:591) <Debug> (DatasetsInit) -- datasets done: 0x55d95b577850


At first I was thinking it might be tuneable but the suricata.log entries referencing memcap and hash-size just appear to be utility items from util-thash.c (I'll check with the OISF folks to make sure I am not missing something)

[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.memcap'
[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.hash-size'
[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.hash-size'


I am not sure if there's a use case yet for that many items in a list but nice to know the option is there :)

--- Update October 1, 2019 ---
The hash-size and memcap can be configured per dataset (Thanks Victor!) Example snippet:

datasets:
 - dns-seen:
     type: sha256
     state: topdomains.lst

dns-seen:
  memcap: 1024mb
  hash-size: 1024mb

With the memcap and hash-size additions, I was able to load all 10,000,000 domain hashes. While this is hardly practical, it can be done :)

real    78m25.400s
user    78m2.285s
sys    0m4.419s

[4544] 1/10/2019 -- 21:09:31 - (host.c:276) <Config> (HostInitConfig) -- preallocated 1000 hosts of size 136
[4544] 1/10/2019 -- 21:09:31 - (host.c:278) <Config> (HostInitConfig) -- host memory usage: 398144 bytes, maximum: 33554432
[4544] 1/10/2019 -- 21:09:31 - (util-coredump-config.c:142) <Config> (CoredumpLoadConfig) -- Core dump size is unlimited.
[4544] 1/10/2019 -- 21:09:31 - (util-conf.c:98) <Notice> (ConfigGetDataDirectory) -- returning '.'
[4544] 1/10/2019 -- 21:09:31 - (datasets.c:219) <Config> (DatasetLoadSha256) -- dataset: dns-seen loading from './topdomains.lst'
[4544] 1/10/2019 -- 22:27:16 - (datasets.c:275) <Config> (DatasetLoadSha256) -- dataset: dns-seen loaded 10000000 records
[4544] 1/10/2019 -- 22:27:16 - (defrag-hash.c:245) <Config> (DefragInitConfig) -- allocated 3670016 bytes of memory for the defrag hash... 65536 buckets of size 56
[4544] 1/10/2019 -- 22:27:16 - (defrag-hash.c:272) <Config> (DefragInitConfig) -- preallocated 65535 defrag trackers of size 160

No comments:

Post a Comment