Skip to content. | Skip to navigation

Navigation

You are here: Home / Support / Guides / Tools / Email / Observations on SPAM

Personal tools

Observations on SPAM

Spammers are surprisingly dim. Or at least they appear so, maybe they have other constraints with which I am not familiar and they are, in fact, solving genuinely hard problems.

Not having any spam infrastructure to analyse we are left to speculate but here goes. We are looking at what PolicyD sees which is essentially source and destination attributes. So, not the content of emails which, despite the valiant attempts of bayesian filters, is usually a job for a human. Unless I've broken SpamAssassin.

Botnets

The most obvious source of spam is from botnets. We can identify these because we'll see a slew of emails from a wide range of IP addresses but all with something in common. Almost always it is the same sender.

Botnets have gotten much better these days and will correctly use a HELO string of their external domain name or literal IP address ([a.b.c.d]). This fails CheckHelo's RFC2821 checks if a.b.c.d results in a domain name that doesn't have a forward lookup ("A reverse but no forward? Amazing!") or which ever resolver they're using returns an internal domain name.

However, because they use the same sender then they are at trivial risk of a correctly set SPF record denying them usage.

You rarely see many emails from the same IP address.

That said, there have been botnets which switched modes from the "try sending an email at a time from each of several IP addresses" to the visually more aggressive "send lots of emails from one IP address." I guess that's either a random decision to switch attack mode or perhaps the spam software detected some weakness in the destination system from a particular IP address and tried to exploit it.

Usually the destination address is a set of users often including historic addresses and odd constructions: bob@example.com might become okbob@example.com or bobn@example.com which suggests corrupted results from scraping bulletin boards and the like from the rosy old days when email addresses were visible.

Usually, greylisting will defer these emails and we'll never see them again. We'll transiently accumulate senders and sender IP addresses in the database which will get cleared out eventually.

Interestingly, you don't often get any actual email from these systems. I wonder if they're simply scanning for valid email addresses to be sold on to others.

Another spin from spammers takes a leaf from legitimate email. If a legitimate system wants to track whether an email address is still valid then they will set the sender to something recipient-specific. So a message to bob@example.com might be sent from bounce-bob=example.com@corp.example. Here, if the email address bob@example.com is no longer valid (or has some other issue) a bounce message will be sent to the sender, bounce-bob=example.com@corp.example and the remote system can deconstruct the recipient, bounce-bob=example.com and determine that the problem was with the message sent to bob@example.com. Of course, in a more sophisticated age, the user component, bounce-bob=example.com, might be encoded in some way or even a random ID that can be matched to bob@example.com in a database.

Spammers, are starting to do the same. In this case, there is a tendency to reuse the same domain name and so you would want to place constraints on the domain name component rather than the specific email address.

Taking of bounce messages... These are identified as having a NULL sender. You'll see this in the logs most often as a blank (nominally hard to spot but from=, is easier and sometimes as <>) and is matched in PolicyD with an @ on its own. As an aside, by using a NULL sender the sending system is suggesting that it doesn't care for a reply if this message fails to reach a destination which in turn prevents a snowball effect of repeatedly bouncing "no such user" messages.

You would think that spammers might utilise NULL senders more as a means to surreptitiously sneak round any policies we set as the email can contain a Sender or Reply-To header which would be used in preference to the From header anyway. But they don't. Maybe they used to and email systems have cottoned on and so it is handled elsewhere.

Mitigations

What can we do? What should we do? Obviously, any invalid email address, okbob@example.com, will get rejected later when the system discovers there is no such user to whom the message can be delivered. Which is fine but we've wasted a bit of time and resource getting there. We know okbob has been used countless times by spammers before (because we spend our days watching the mail logs, don't we?) so we could use it as a marker. I'd like to say "honeypot" but that suggests that we've constructed something and tricked the spammers into using it whereas it seems that the spammers have simply resold bad data to each other over the years.

OK, so what can we do? Can we ban IP addresses? Yes, but I understand there's quite a few IP addresses out there and there's only so much room in our database. We can ban subnets for a bit. By and large, though, banning the IP addresses of members of a botnet is a waste of time. It is legitimate if you've discovered a cranky bot wielding its own form of data vengeance upon you but those seem to be fewer in number. Perhaps that's more relevant to a higher profile site.

Can we ban the sender? Yes, we can add the sender to a policy where we apply AccessControl and REJECT/DISCARD the email. That'll certainly cut down on later processing.

However, there's a PolicyD problem. The AccessControl module doesn't have any auto-management of its contents. The botnet will get bored and move on after a few hours and all we're doing is accumulating utterly random sender addresses which will cost us computing time to test against for every email.

Document Actions