Last time, we gathered some data:
~/found> ls -sh | grep total total 87M
This time, let's find the goodies.
The first idea was to filter out duplicates. I thought, let's just hash the files and group the same hashes.
This didn't work at all, some files were pretty much the same, but with small differences resulting in different hashes.
I thought - I am looking for password lists, so let's find all of those that have an email address on at least 80% of lines:
#!/usr/bin/env python3 import os import hashlib import re email_re = r"[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?" if __name__ == "__main__": outdir = "./found" new_outdir = "./sorted" for root, dirs, files in os.walk(outdir): for filename in files: email_count = 0 line_count = 0 with open(root+'/'+filename, 'r') as f: data = f.read() for line in data.splitlines(): if re.match(email_re, line): email_count += 1 line_count += 1 if float(email_count) / float(line_count) > 0.8: # a lot of emails os.makedirs(new_outdir+"/"+"emails", exist_ok=True) with open(new_outdir+"/emails/"+filename, 'w') as f: f.write(data)
This worked better, narrowing down the material to:
~/sorted/emails> ls -sh | grep total total 4.3M
A quick look reveals that there are files consisting only from emails per line. I am not interested in those, so I'll filter them out (exercise for the reader):
~/sorted/emails_more> ls -sh | grep total total 396K
There are only a couple false positives left, yay!
So, what did we got this way?
The classic format of email:pass was found in couple files. But only a few of them were there - 52 unique pairs total.
email:pass
Couple of files (292KB) contained what seemed to be SMTP details. The lines were in form server|port|email_addr|pass.
server|port|email_addr|pass
Since most of the ports were 587, I concluded that these are SMTP servers. There were couple of ports 25 too, in the exact same format.
I've got 4846 unique set of creds, not sure how old these are, but one of them has to work. As I am not looking for a spam operation, I won't take a closer look at it.
I saw some exims (mail servers) - there were couple serious vulnerabilities lately, so I suspect this was this kind of campaign.
Worked exactly as expected. Lot of trash, but some nuggets (368K from 87M that's 0.42%). I was digging in history to prove a point, but from the attacker's perspective it's most interesting to capture these as fresh as possible, iterating over many files in real time.
Stay tuned, as we'll take a look at more exotic findings in the next part. I'd like to explore other paths as well (YARA? Automatic verificator for live creds? Who knows?).