Monday, September 30, 2019

How big can a suricata dataset be?

After poking around with datasets yesterday I was curious how many domain sha256 hashes we can load with the default dataset configuration.

The list came from https://www.domcop.com/top-10-million-domains and I just grabbed chunks with 'head -n' against the list of domains in the csv export (just the 2nd field in the csv, the actual domains).


I was able to load 155,802 hashes/domains!

[11148] 30/9/2019 -- 21:48:12 - (datasets.c:275) <Config> (DatasetLoadSha256) -- dataset: dns-seen loaded 155802 records
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:491) <Debug> (DatasetGet) -- set 0x55d95b6753e0/dns-seen type 3 save ./topdomains.155k.lst load ./topdomains.155k.lst
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:577) <Debug> (DatasetsInit) -- dataset dns-seen: id 0 type sha256
[11148] 30/9/2019 -- 21:48:12 - (datasets.c:591) <Debug> (DatasetsInit) -- datasets done: 0x55d95b577850


At first I was thinking it might be tuneable but the suricata.log entries referencing memcap and hash-size just appear to be utility items from util-thash.c (I'll check with the OISF folks to make sure I am not missing something)

[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.memcap'
[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.hash-size'
[10211] 30/9/2019 -- 21:12:56 - (conf.c:335) <Debug> (ConfGet) -- failed to lookup configuration parameter 'dns-seen.hash-size'


I am not sure if there's a use case yet for that many items in a list but nice to know the option is there :)

--- Update October 1, 2019 ---
The hash-size and memcap can be configured per dataset (Thanks Victor!) Example snippet:

datasets:
 - dns-seen:
     type: sha256
     state: topdomains.lst

dns-seen:
  memcap: 1024mb
  hash-size: 1024mb

With the memcap and hash-size additions, I was able to load all 10,000,000 domain hashes. While this is hardly practical, it can be done :)

real    78m25.400s
user    78m2.285s
sys    0m4.419s

[4544] 1/10/2019 -- 21:09:31 - (host.c:276) <Config> (HostInitConfig) -- preallocated 1000 hosts of size 136
[4544] 1/10/2019 -- 21:09:31 - (host.c:278) <Config> (HostInitConfig) -- host memory usage: 398144 bytes, maximum: 33554432
[4544] 1/10/2019 -- 21:09:31 - (util-coredump-config.c:142) <Config> (CoredumpLoadConfig) -- Core dump size is unlimited.
[4544] 1/10/2019 -- 21:09:31 - (util-conf.c:98) <Notice> (ConfigGetDataDirectory) -- returning '.'
[4544] 1/10/2019 -- 21:09:31 - (datasets.c:219) <Config> (DatasetLoadSha256) -- dataset: dns-seen loading from './topdomains.lst'
[4544] 1/10/2019 -- 22:27:16 - (datasets.c:275) <Config> (DatasetLoadSha256) -- dataset: dns-seen loaded 10000000 records
[4544] 1/10/2019 -- 22:27:16 - (defrag-hash.c:245) <Config> (DefragInitConfig) -- allocated 3670016 bytes of memory for the defrag hash... 65536 buckets of size 56
[4544] 1/10/2019 -- 22:27:16 - (defrag-hash.c:272) <Config> (DefragInitConfig) -- preallocated 65535 defrag trackers of size 160

Sunday, September 29, 2019

Datasets with Suricata

Suricata recently introduced datasets so I thought I would take a stab at using them and seeing what could maybe be done with them.

From the 5.0.0rc1 release announcement:

"Still experimental at this time, the initial work to support datasets is part of this release. It allows matching on large amounts of data. It is controlled from the rule language and will work with any ‘sticky buffer’. https://suricata.readthedocs.io/en/suricata-5.0.0-rc1/rules/datasets.html
"

A lot of my recent research/work has been around DNS so I figured I would start there.

Step 1:

Get suricata master or 5.0.0rc1 installed, the installation from source instructions are here:
https://suricata.readthedocs.io/en/latest/install.html

Source tarballs can be retrieved from:
https://github.com/oisf/suricata/

If you install from github be sure to clone libhtp into the suricata repo directory, libhtp can be found here:
https://github.com/OISF/libhtp


On a Fedora 30 system installing the build time requirements should be something like:'sudo dnf install gcc gcc-c++ rust cargo libyaml-devel python3-pyyaml libnfnetlink-devel libnetfilter_queue-devel libnet-devel zlib-devel pcre-devel libcap-ng-devel lz4-devel libpcap-devel nspr-devel nss-devel nss-softokn-devel file-devel  jansson-devel GeoIP-devel python3-devel lua-devel autoconf automake libtool'

My particular install steps for Suricata from source (Fedora 30) with all the build requirements installed:


mkdir git
cd git
git clone https://github.com/OISF/suricata.git
cd suricata
git clone https://github.com/OISF/libhtp.git
./autogen.sh; ./configure --enable-gccprotect --enable-pie --disable-coccinelle --enable-nfqueue --enable-af-packet --with-libnspr-includes=/usr/include/nspr4 --enable-jansson --enable-geoip --enable-lua --enable-rust --enable-debug --enable-profiling --enable-rust-debug --enable-lzma
make
sudo make install


Step 2:

I chose to put part of the configuration in the suricata.yaml file since setting up the datasets seems like something that would be easier to do in the yaml with a configuration management setup (chef, ansible, puppet, etc.).

So I tossed the following in the suricata.yaml just under the yaml header/boilerplate stuff:

datasets:
 - dns-seen:
     type: sha256
     state: dns-seen.lst

This is telling Suricata that I want to set up a dataset name 'dns-seen' and that it will contain sha256 values and running and saved information about the dataset will be stored in a file named dns-seen.lst.

So far so good.

Step 3:

Time to write a rule that will use the dataset. Since we are going to look for DNS queries, it only makes sense of course to use the DNS sticky buffer. Time to be creative... :)

alert dns any any -> any any (msg: "dns list test"; dns.query; to_sha256; dataset:isset,dns-seen; sid:123; rev:1;)

What this rule will do is write an alert (in our case to alert.json) for any traffic that the Suricata protocol parser(s) determines is a DNS query for any domain in our dns-seen.lst file.

One thing worth noting here is the to_sha256 keyword, this is what Suricata calls a transform. This keyword will tell suricata to take whatever is in the dns buffer and calculate a sha256 hash. To say the least transforms are quite useful in rule writing!

For more on transforms:
https://suricata.readthedocs.io/en/latest/rules/transforms.html

Step 4:

Okay so we have all the prep work done..well not quite. So generally speaking domain IOCs don't come to us as sha256 hashes. Soo what do we do?

We write some bad python:
https://github.com/jmtaylor90/dns2sha256

This simple script takes a file with domain names and writes out a file with the corresponding sha256 hash values.

In my case I selected, google.com, slashdot.org and reddit.com to use in the dns-seen.lst file.

Step 5:

Now that we have our hashes, we just need a pcap with DNS traffic that contains DNS queries for our domains. Using dig and tcpdump we can generate the traffic and pcap.

something like 'tcpdump -nn -i $activenetworkcard -w dnslisttest.pcap' and then

dig google.com
dig reddit.com
dig slashdot.org

Step 6:

Now we can replay our pcap through suricata to see what happens. I ran the following:


jason@dinosaur suri]$ rm *.json *.log; $(which suricata) -k none -c suricata.yaml -r dnslisttest.pcap

This just makes sure I don't have an old logs laying around and then runs suricata with my configuration file and replays the pcap we captured in Step 5.

If I look at the alert.json file (eve log configured for alerts):
{"timestamp":"2019-09-28T20:38:06.536624-0400","flow_id":2206638181068848,"pcap_cnt":11,"event_type":"alert","src_ip":"172.16.42.7","src_port":38966,"dest_ip":"172.16.42.1","dest_port":53,"proto":"UDP","tx_id":0,"alert":{"action":"allowed","gid":1,"signature_id":123,"rev":1,"signature":"dns list test","category":"","severity":3},"dns":{"query":[{"type":"query","id":16741,"rrname":"google.com","rrtype":"A","tx_id":0}]},"app_proto":"dns","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":93,"bytes_toclient":0,"start":"2019-09-28T20:38:06.536624-0400"},"payload":"QWUBIAABAAAAAAABBmdvb2dsZQNjb20AAAEAAQAAKRAAAAAAAAAMAAoACAI9cKIVTyJM","stream":0,"packet":"8nUnCb+QtLZ2CFMnCABFAABPXUkAAEARcSysECoHrBAqAZg2ADUAO6PpQWUBIAABAAAAAAABBmdvb2dsZQNjb20AAAEAAQAAKRAAAAAAAAAMAAoACAI9cKIVTyJM","packet_info":{"linktype":1}}
{"timestamp":"2019-09-28T20:38:01.959896-0400","flow_id":2191197773342104,"pcap_cnt":5,"event_type":"alert","src_ip":"172.16.42.7","src_port":38358,"dest_ip":"172.16.42.1","dest_port":53,"proto":"UDP","tx_id":0,"alert":{"action":"allowed","gid":1,"signature_id":123,"rev":1,"signature":"dns list test","category":"","severity":3},"dns":{"query":[{"type":"query","id":22527,"rrname":"slashdot.org","rrtype":"A","tx_id":0}]},"app_proto":"dns","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":95,"bytes_toclient":0,"start":"2019-09-28T20:38:01.959896-0400"},"payload":"V\/8BIAABAAAAAAABCHNsYXNoZG90A29yZwAAAQABAAApEAAAAAAAAAwACgAIrWxDkiq\/JNg=","stream":0,"packet":"8nUnCb+QtLZ2CFMnCABFAABRTS4AAEARgUWsECoHrBAqAZXWADUAPe+oV\/8BIAABAAAAAAABCHNsYXNoZG90A29yZwAAAQABAAApEAAAAAAAAAwACgAIrWxDkiq\/JNg=","packet_info":{"linktype":1}}
{"timestamp":"2019-09-28T20:38:10.967322-0400","flow_id":781430593602202,"pcap_cnt":13,"event_type":"alert","src_ip":"172.16.42.7","src_port":39980,"dest_ip":"172.16.42.1","dest_port":53,"proto":"UDP","tx_id":0,"alert":{"action":"allowed","gid":1,"signature_id":123,"rev":1,"signature":"dns list test","category":"","severity":3},"dns":{"query":[{"type":"query","id":11178,"rrname":"reddit.com","rrtype":"A","tx_id":0}]},"app_proto":"dns","flow":{"pkts_toserver":1,"pkts_toclient":0,"bytes_toserver":93,"bytes_toclient":0,"start":"2019-09-28T20:38:10.967322-0400"},"payload":"K6oBIAABAAAAAAABBnJlZGRpdANjb20AAAEAAQAAKRAAAAAAAAAMAAoACHszHT2Q7UUp","stream":0,"packet":"8nUnCb+QtLZ2CFMnCABFAABPaqMAAEARY9KsECoHrBAqAZwsADUAO6btK6oBIAABAAAAAAABBnJlZGRpdANjb20AAAEAAQAAKRAAAAAAAAAMAAoACHszHT2Q7UUp","packet_info":{"linktype":1}}

It fired alerts!

So this is pretty interesting functionality and I expect to test a lot more with it in the upcoming months.