Hacker Trash Digging, Part 2

Last time, we gathered some data:

~/found> ls -sh | grep total
total 87M

This time, let's find the goodies.

Idea 1: Hashes for unique files

The first idea was to filter out duplicates. I thought, let's just hash the files and group the same hashes.

This didn't work at all, some files were pretty much the same, but with small differences resulting in different hashes.

Idea 2: Lots of emails

I thought - I am looking for password lists, so let's find all of those that have an email address on at least 80% of lines:

#!/usr/bin/env python3
import os
import hashlib
import re

email_re = r"[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"

if __name__ == "__main__":
    outdir = "./found"
    new_outdir = "./sorted"
    for root, dirs, files in os.walk(outdir):
        for filename in files:
            email_count = 0
            line_count = 0
            with open(root+'/'+filename, 'r') as f:
                data = f.read()
            for line in data.splitlines():
                if re.match(email_re, line):
                    email_count += 1
                line_count += 1
            if float(email_count) / float(line_count) > 0.8: # a lot of emails
                os.makedirs(new_outdir+"/"+"emails", exist_ok=True)
                with open(new_outdir+"/emails/"+filename, 'w') as f:
                    f.write(data)

This worked better, narrowing down the material to:

~/sorted/emails> ls -sh | grep total
total 4.3M

Idea 3: Lots of emails, but ignore clean emails

A quick look reveals that there are files consisting only from emails per line. I am not interested in those, so I'll filter them out (exercise for the reader):

~/sorted/emails_more> ls -sh | grep total
total 396K

There are only a couple false positives left, yay!

Goodies

So, what did we got this way?

Just creds

The classic format of email:pass was found in couple files. But only a few of them were there - 52 unique pairs total.

SMTP servers

Couple of files (292KB) contained what seemed to be SMTP details. The lines were in form server|port|email_addr|pass.

Since most of the ports were 587, I concluded that these are SMTP servers. There were couple of ports 25 too, in the exact same format.

I've got 4846 unique set of creds, not sure how old these are, but one of them has to work. As I am not looking for a spam operation, I won't take a closer look at it.

I saw some exims (mail servers) - there were couple serious vulnerabilities lately, so I suspect this was this kind of campaign.

Concluding

Worked exactly as expected. Lot of trash, but some nuggets (368K from 87M that's 0.42%). I was digging in history to prove a point, but from the attacker's perspective it's most interesting to capture these as fresh as possible, iterating over many files in real time.

Stay tuned, as we'll take a look at more exotic findings in the next part. I'd like to explore other paths as well (YARA? Automatic verificator for live creds? Who knows?).