Thoughts on Security and Stuff: IDS

Friday, October 11, 2019

More Dataset Performance Notes

-- Update: Forgot to mention we have a number of PTResearch rules too --

in the last post we covered the performance of a single rule using a large (10 million record) dataset.

Since it's been running for a few days now I wanted to see what performance was like with a dataset based rule using the same large dataset but also using the our mix of VRT/ET/PTResearch/custom rules.

Here's where things stand.

Our current uptime is:
tail -n 1 stats.json | jq 'select(.event_type=="stats").stats.uptime'
595932 (~6.9 days)

Our current packet status:
tail -n 1 stats.json | jq -c 'select(.event_type=="stats").stats.capture'
{"kernel_packets":102603516231,"kernel_packets_delta":13328638,"kernel_drops":21501885,"kernel_drops_delta":0,"errors":0,"errors_delta":0}

Let's make that a percentage kernel drop:
tail -n 1 stats.json | jq '.stats.capture.kernel_drops / .stats.capture.kernel_packets'
0.0002095900739094382 (0.00021%)

When we loaded the ET Pro set we use suricatasc so we need to see last_reload:
tail -n 1 stats.json-20191011 | jq -c 'select(.event_type=="stats")|.stats.detect.engines'
[{"id":0,"last_reload":"2019-10-07T18:57:51.607027+0000","rules_loaded":31887,"rules_failed":6}]

The failed rules are expected, it's on my list of things to fix :)

So since we loaded our rules, how many rules have we seen fire?

From our alert.json logs we have seen 104,638 alerts since we reloaded our rules. A sample of our dataset alert looks like:

{"timestamp":"2019-10-07T18:58:03.362167+0000","flow_id":1566205101704887,"in_iface":"p4p1","event_type":"alert","vlan":[245],"src_ip":"10.0.0.133","src_port":50201,"dest_ip":"8.8.8.8","dest_port":53,"proto":"UDP","tx_id":0,"alert":{"action":"allowed","gid":1,"signature_id":99070083,"rev":1,"signature":"DNS List Test","category":"","severity":3},"dns":{"query":[{"type":"query","id":30205,"rrname":"detectportal.firefox.com","rrtype":"A","tx_id":0}]},"app_proto":"dns","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":84,"bytes_toclient":0,"start":"2019-10-07T18:58:03.362167+0000"},"stream":0,"packet_info":{"linktype":1},"host":"sensor01"}

So far it looks like performance is within our expectations. Next up is taking our ranked domain IOCs and put them into a reputation list rule.

Monday, September 30, 2019

How big can a suricata dataset be?

After poking around with datasets yesterday I was curious how many domain sha256 hashes we can load with the default dataset configuration.

The list came from https://www.domcop.com/top-10-million-domains and I just grabbed chunks with 'head -n' against the list of domains in the csv export (just the 2nd field in the csv, the actual domains).

I was able to load 155,802 hashes/domains!

[11148] 30/9/2019 -- 21:48:12 - (datasets.c:275) <Config> (DatasetLoadSha256) -- dataset: dns-seen loaded 155802 records
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:491) <Debug> (DatasetGet) -- set 0x55d95b6753e0/dns-seen type 3 save ./topdomains.155k.lst load ./topdomains.155k.lst
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:577) <Debug> (DatasetsInit) -- dataset dns-seen: id 0 type sha256
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:591) <Debug> (DatasetsInit) -- datasets done: 0x55d95b577850

At first I was thinking it might be tuneable but the suricata.log entries referencing memcap and hash-size just appear to be utility items from util-thash.c (I'll check with the OISF folks to make sure I am not missing something)

[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.memcap'
[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.hash-size'
[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.hash-size'

I am not sure if there's a use case yet for that many items in a list but nice to know the option is there :)

--- Update October 1, 2019 ---
The hash-size and memcap can be configured per dataset (Thanks Victor!) Example snippet:

datasets:
- dns-seen:
     type: sha256
     state: topdomains.lst

dns-seen:
memcap: 1024mb
hash-size: 1024mb

With the memcap and hash-size additions, I was able to load all 10,000,000 domain hashes. While this is hardly practical, it can be done :)

real   78m25.400s
user   78m2.285s
sys   0m4.419s

[4544] 1/10/2019 -- 21:09:31 - (host.c:276) <Config> (HostInitConfig) -- preallocated 1000 hosts of size 136
[4544] 1/10/2019 -- 21:09:31 - (host.c:278) <Config> (HostInitConfig) -- host memory usage: 398144 bytes, maximum: 33554432
[4544] 1/10/2019 -- 21:09:31 - (util-coredump-config.c:142) <Config> (CoredumpLoadConfig) -- Core dump size is unlimited.
[4544] 1/10/2019 -- 21:09:31 - (util-conf.c:98) <Notice> (ConfigGetDataDirectory) -- returning '.'
[4544] 1/10/2019 -- 21:09:31 - (datasets.c:219) <Config> (DatasetLoadSha256) -- dataset: dns-seen loading from './topdomains.lst'
[4544] 1/10/2019 -- 22:27:16 - (datasets.c:275) <Config> (DatasetLoadSha256) -- dataset: dns-seen loaded 10000000 records
[4544] 1/10/2019 -- 22:27:16 - (defrag-hash.c:245) <Config> (DefragInitConfig) -- allocated 3670016 bytes of memory for the defrag hash... 65536 buckets of size 56
[4544] 1/10/2019 -- 22:27:16 - (defrag-hash.c:272) <Config> (DefragInitConfig) -- preallocated 65535 defrag trackers of size 160