Monday, October 14, 2019

Reputation Lists and Datasets with Suricata

Another use case we were wondering about is domain IOCs and datasets using the reputation option.

So I exported our domain IOCs from our intel platform into a csv. To transform the IOC domains to sha256, I used this python script.

The values in the original csv are domain,numerical_reputation format, so the script just grabs the domain, converts it and then smashes it back together with its associated reputation and then writes it all out to a new file for use in our Suricata rule.

Victor made some enhancements to the way the suricata.yaml dataset configuration gets parsed and loaded so the suricata.yaml snippet looks a little different than in my original post.

Also after some discussion with Jason Ish on the OISF team, I made a change to the location for our state files and now will be putting all our lists into a common but separate directory from our rules.
This is more or less to make sure nothing gets overwritten by rule set changes if there are naming colllisions.

suricata.yaml snippet:

datasets:
  dns-seen:
    type: sha256
    load: /nsm/lists/topdomains.lst
    hash:
      memcap: 1024mb
      hash-size: 1024mb
      prealloc: 1024mb

  dns-ioc:
    type: sha256
    load: /nsm/lists/domainioc.lst
    hash:
      memcap: 256mb
      hash-size: 256mb
      prealloc: 256mb


The chunk I selected for the domainioc.lst is just shy of 700,000 records. An example of the format for domainioc.lst:

d4c9d9027326271a89ce51fcaf328ed673f17be33469ff979e8ab8dd501e664f,8.0

Our test rule is:
alert dns any any -> any any (msg:"DNS IOC List Test"; dns.query; to_sha256; datarep:dns-ioc, >, 7.0, sid:1234; rev:1;)

alert.json entry sample:

Looking at the suricata load time:
13929] 11/10/2019 -- 19:12:21 - (suricata.c:1078) <Notice> (LogVersion) -- This is Suricata version 5.0.0-dev (a1ee536 2019-10-10) running in SYSTEM mode
[13929] 11/10/2019 -- 19:12:21 - (util-cpu.c:171) <Info> (UtilCpuPrintSummary) -- CPUs/cores online: 40

14078] 11/10/2019 -- 19:21:03 - (source-af-packet.c:1802) <Perf> (AFPComputeRingParamsV3) -- AF_PACKET V3 RX Ring params: block_size=32768 block_nr=3126 frame_size=1600 frame_nr=62520 (mem: 102432768)
[14078] 11/10/2019 -- 19:21:04 - (source-af-packet.c:515) <Info> (AFPPeersListReachedInc) -- All AFP capture threads are running.

It looks like we are at about 8.5 minutes or so. This seems reasonable given how much data we are loading but am curious what others are seeing with their setups. I will run this by the OISF folks as well to get their thoughts.

Friday, October 11, 2019

More Dataset Performance Notes

-- Update: Forgot to mention we have a number of PTResearch rules too --

in the last post we covered the performance of a single rule using a large (10 million record) dataset.

Since it's been running for a few days now I wanted to see what performance was like with a dataset based rule using the same large dataset but also using the our mix of VRT/ET/PTResearch/custom rules.

Here's where things stand.

Our current uptime is:
tail -n 1 stats.json | jq 'select(.event_type=="stats").stats.uptime'
595932 (~6.9 days)


Our current packet status:
tail -n 1 stats.json | jq -c 'select(.event_type=="stats").stats.capture'
{"kernel_packets":102603516231,"kernel_packets_delta":13328638,"kernel_drops":21501885,"kernel_drops_delta":0,"errors":0,"errors_delta":0}

Let's make that a percentage kernel drop:
tail -n 1 stats.json | jq '.stats.capture.kernel_drops / .stats.capture.kernel_packets'
0.0002095900739094382 (0.00021%)

When we loaded the ET Pro set we use suricatasc so we need to see last_reload:
tail -n 1 stats.json-20191011 | jq -c 'select(.event_type=="stats")|.stats.detect.engines'
[{"id":0,"last_reload":"2019-10-07T18:57:51.607027+0000","rules_loaded":31887,"rules_failed":6}]

The failed rules are expected, it's on my list of things to fix :)

So since we loaded our rules, how many rules have we seen fire?

From our alert.json logs we have seen 104,638 alerts since we reloaded our rules. A sample of our dataset alert looks like:

{"timestamp":"2019-10-07T18:58:03.362167+0000","flow_id":1566205101704887,"in_iface":"p4p1","event_type":"alert","vlan":[245],"src_ip":"10.0.0.133","src_port":50201,"dest_ip":"8.8.8.8","dest_port":53,"proto":"UDP","tx_id":0,"alert":{"action":"allowed","gid":1,"signature_id":99070083,"rev":1,"signature":"DNS List Test","category":"","severity":3},"dns":{"query":[{"type":"query","id":30205,"rrname":"detectportal.firefox.com","rrtype":"A","tx_id":0}]},"app_proto":"dns","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":84,"bytes_toclient":0,"start":"2019-10-07T18:58:03.362167+0000"},"stream":0,"packet_info":{"linktype":1},"host":"sensor01"}

So far it looks like performance is within our expectations. Next up is taking our ranked domain IOCs and put them into a reputation list rule.

Monday, October 7, 2019

Performance with a single large dataset and single rule

Our singe large dataset that we loaded in this post has been running for about 4 days. Taking a look at our stats from stats.json.

tail -n 1 stats.json | jq -c 'select(.event_type=="stats").stats.capture'
{"kernel_packets":56282979747,"kernel_packets_delta":12499348,"kernel_drops":9927231,"kernel_drops_delta":0,"errors":0,"errors_delta":0}

So if we do some quick math:
tail -n 1 stats.json | jq '.stats.capture.kernel_drops / .stats.capture.kernel_packets'
0.00017638069349960352

tail -n 1 stats.json | jq -c 'select(.event_type=="stats")|.stats.detect.engines'
[{"id":0,"last_reload":"2019-10-03T20:09:39.014088+0000","rules_loaded":1,"rules_failed":0}]

tail -n 1 stats.json | jq 'select(.event_type=="stats").stats.uptime'
339487

So our sensor has processed 56,282,979,747 packets in the last ~4 days and has dropped 9,927,231 packets which gives us a drop percentage of 0.00017638069349960352%.

It should be noted that the only rule loaded is our dataset based DNS rule, which means that it's time to load the ET Pro rule set and see what performance looks like. :)


Thursday, October 3, 2019

How big can a suricata dataset be? take 2..

Continuing work with datasets, I was previously working on my laptop and got time to test this on real hardware.

The real hardware are just off the shelf Dell r700 series with 256G of RAM.

snippet of lscpu output:
Architecture:            x86_64
CPU op-mode(s):     32-bit, 64-bit
Byte Order:              Little Endian
CPU(s):                    40
On-line CPU(s) list: 0-39
Thread(s) per core:  2
Core(s) per socket:  10
Socket(s):                2
NUMA node(s):      4
Vendor ID:              GenuineIntel
CPU family:            6
Model:                     63
Model name:           Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz

So of course the first thing to test is just HOW FAST can we load 10 million records :)

The answer is:
real 7m57.045s
user 7m52.763s
sys 0m3.446s

[16012] 3/10/2019 -- 18:33:55 - (datasets.c:219) <Config> (DatasetLoadSha256) -- dataset: dns-seen loading from '/etc/nsm/eint/lists/topdomains.lst'
[16012] 3/10/2019 -- 18:41:45 - (datasets.c:275) <Config> (DatasetLoadSha256) -- dataset: dns-seen loaded 10000000 records
[16012] 3/10/2019 -- 18:41:45 - (defrag-hash.c:248) <Config> (DefragInitConfig) -- allocated 3670016 bytes of memory for the defrag hash... 65536 buckets of size 56
[16012] 3/10/2019 -- 18:41:45 - (defrag-hash.c:273) <Config> (DefragInitConfig) -- preallocated 65535 defrag trackers of size 160
[16012] 3/10/2019 -- 18:41:45 - (defrag-hash.c:280) <Config> (DefragInitConfig) -- defrag memory usage: 14155616 bytes, maximum: 1350565888
[16012] 3/10/2019 -- 18:41:45 - (stream-tcp.c:399) <Config> (StreamTcpInitConfig) -- stream "prealloc-sessions": 2048 (per thread)

Now on to seeing about reputation lists...