The Future of SPAM

How might things get worse

The advantage we have over spammers is that they are not very sophisticated and can be defeated by relatively trivial validity tests. The danger is, then, that spam infrastructures will improve and the world will become a worse place for it.

Here's my obvious thoughts.

Assumptions

I don't know much about botnets other than that they exist and send me spam. They are presumably enabled by weak/lapsed security of machines accessible either directly from the Internet (rare?) or indirectly because they are exposed by malicious software unleashed internally on a network. If I open a dodgy attachment (click on a popup etc.) then every machine my current machine can see on my internal LANs is at risk. Once one of them has been compromised then everything it can see is at risk.

You assume a compromised machine has a simple payload used to access a server to download the full payload. It's hard to imagine a small email attachment containing even a simple MTA.

At this point we have a key gain for the following discussion, a remote machine with some form of inbound access to it can act as a server.

You would also assume, that this bot-server is itself a patsy and nothing special in the larger command and control network as I find it hard to believe anyone remotely sophisticated would leave their home address embedded in a bazillion email attachments. In which case the email attachment probably contains the IP addresses of a number of remote bot-servers. Try each one until you find one that works.

Note

As an aside, this is exactly how the DNS works. Your OS distribution has a bootstrap file of the DNS glue records for the root servers. Your OS distribution could be many years old and several of the root servers could have been and gone (or even just be offline temporarily) so you try each of the default glue records in turn until you find one that answers at which point you can ask it for the up-to-date stock list of root servers.

An addendum to that thought is that, given a relatively rapid turnover of such bot-servers, the email attachments containing a list of bootstrap bot-servers must change equally rapidly over time as new bot-server IP addresses replace the old.

We should also assume that the bot is not directly connected to the Internet but is on a private LAN behind a router.

Being behind a router on a private LAN does not prevent you from upgrading yourself from a bot to a bot-server. If the router has UPnP enabled (normally for games and messaging services) then the bot can ask the router to open an inbound port to itself then publish that it is now a bot-server to the command and control network.

Bot-Servers

The problem with firewalls is that once you've opened an (inbound) hole then you've allowed anything inbound. Any kind of service whatsoever. Which is a bit remiss.

You do get content-aware (content-snooping) firewalls but those require to be configured with knowledge of the content -- which could be anything as we shall see.

Let us suppose that our proto-bot-server has persuaded the router to open inbound access on what will be some random port number. Suppose we even have to claim that we will be operating SMTP, say. We'll use SMTP as an example of a published, well-known standard with plenty of support.

Encryption

For a start, despite the best efforts of some firewalls to silent squelch it, the "extended hello" keyword, EHLO, in SMTP can offer STARTTLS as an option after which the dialogue between client and server is encrypted by TLS and the firewall cannot perform any snooping.

Extensions

Whether encrypted or not, most protocols offer some degree of extensibility and EHLO lists what extensions the SMTP server (the recipient end) is able to perform.

As well as the usual extensions, SMTP can offer Private-Use Commands, all named X-something-or-other:

may be used by bilateral agreement between the client (sending) and server (receiving) SMTP agents

—RFC 5321, Section 4.1.5

Hmm, without taxing one's brain too much we might envisage the bot-server enabling extensions such as XMAINPAYLOAD, XMYEXTERNALIP, XDOMAIN-YOUVE-PUT-ME-IN-SPF-RECORD-FOR.

All of which are entirely above board in the SMTP spec.. In fact, XMYEXTERNALIP is already available via STUN servers (notably for SIP behind NAT routers).

I should point out that the SMTP specification is not at fault here in that it is positioned as a messaging enabler. The problem is more that once a hole in the firewall is opened is is prohibitively difficult to determine if the dialogue between two hosts is legitimate or shady.

IP Address

When a bot sends me spam PolicyD's first line of defence is CheckHelo which will verify that the HELO string is valid. CheckHelo enforces RFC2821 which says the HELO string must either be:

The literal IP address of the sender: [a.b.c.d]
The FQDN of the sender.

This requires that the bot know the IP address the world will see it as. This is surprisingly subtle. My router won't tell me [1] so I need to make a connection to someone out on the Internet and ask them what IP address I'm coming from.

That's a really trivial problem for the remote server (and an early network systems programming task, see getpeername(2)).

There are a few services that currently do that for you but they are surprisingly rare. The bot needs a bot-server to do this for it. Given that it downloaded its main payload from some bot-server, presumably it can ask the same (or an alternate) remote bot-server for its own external IP address.

Either the bot or bot-server can then do some rudimentary forward and reverse lookups on that IP address to see if the is a verifiable FQDN associated with the IP address. Curiously, this isn't done (yet). Bots will send spam with unverified FQDN -- yes, rather bizarrely, some IP addresses have a PTR record for which there is no corresponding A record. Lackadaisical DNS management!

It won't be long before that loophole is closed. Given that RFC2821 (CheckHelo's reference) allows #1 above then simply using the literal IP address is enough.

That said, even major messaging organisations can't get their DNS sorted.

SPF

PolicyD's second major check is CheckSPF

The spam will be from some sender. We can use SPF to verify that the bot's IP address is allowed to send email for that domain.

Amazingly, botnets do not make this check themselves and it is very easy to filter out.

Two things spring immediately to mind:

botnets will start to use domains with no SPF entry or with an SPF entry with a soft fail verdict.
botnets will start to use their own domains. Ones they've hijacked, of course, but essentially ones under their control.

The first should, over time, become increasingly hard as organisations become more rigorous with SPF.

The second becomes much harder as I can't think of a way for a resolver to distrust a DNS domain. We need a DNS black list for domains that have gone rogue.

DNS

Until then, gaining control over DNS will become an increasingly valuable commodity. This could be at a high level (stealing credentials for a registrar's website) or more mechanical in compromising a DNS master. The latter, I would imagine, to be quite rare in Unix-land as the implementations are probably quite bespoke but much more problematic in Windows-land where the DNS implementations will be a monoculture (and the machines more likely to be compromised in the first place).

Once you have control over a domain then you can start adding all your bots as genuine hosts (possibly with PTR records if you have control over the reverse domain too) or create SPF entries which include your bots. You needn't worry too much about the number of bots as you can trivially extend your domain with subdomains (aa.example.com, ab.example.com etc.) and divide your bots across the SPF records therein.

This is a double whammy for anti-spam software as much of it assumes that the domain part of an email address (the part after the @) is orthogonal for each domain, that is that aa.example.com is wholly independent of ab.example.com). Under most circumstances this is true (example.com and corp.example are two separate domains and unrelated) but for our trivial example of a spammer controlled DNS we would be forced to apply new, and more importantly, fresh analysis to ab.example.com despite the fact that it is clearly related to our spammer aa.example.com.

This is partly because, outside of the original top level domains [2], real organisational distinction starts at the third level in the hierarchy, corp1.co.example and corp2.co.example and so anti-spam measures are unable to distinguish where the domain name system's organisational independence starts and therefore where domain interdependence begins. Knowing nothing about .example (especially as it is an RFC 2606 reserved name) can you determine if a.example and b.example are related or if a.something.example and b.something.example are related or a.some.thing.example and b.some.thing.example?

With DNS support much of the trivial defence against SPAM becomes lost.

Greylisting

PolicyD's third check is Greylisting which is arguably the most superficial and yet is one of the most effective tools.

A spammer need only retry a spam email once to defeat greylisting! How hard will that be?

In fact, don't make it a case of retry, just duplicate your list and start again at the beginning. All those systems that you passed the CheckHelo and CheckSPF checks for but failed the greylisting will now pass.

Quotas

PolicyD's final check is email restriction through quotas.

This is genuinely hard for a spammer to beat as it is a subjective measure determined locally by each mail system administrator. (And then modified on a whim [3].) You can't just pass or fail and you won't face the same restrictions from one site to the next.

The flip side of that for mail system administrators is that there isn't a defined policy. It's up to you to figure out something that works for your system and profile of email volume/rates.

That's a good thing in that mail system administrators own and therefore care about how it works. It's a bad thing in that it requires mail system administrators to have to own it -- you can't just turn it on and walk away.

Other Modules

PolicyD also supports: AccessControl which is a go/no-go module with no dynamism or automation; and Accounting which is quotas but in a fixed time window (rather than a rolling time window as with quotas).

Future Restrictions

We're clean out of other ways to restrict delivery of SMTP largely because of legacy systems. We need a new protocol for delivering messages and then keep SMTP but hold it at arm's length: "thanks for your SMTP message, we'll deliver it in due course."

We can't (read: shouldn't) eliminate SMTP as it is a trivial protocol and very useful for lightweight systems to pass messages around.

Pay Per Mail

Various ideas around pay per mail have been tossed about. Not a literal payment (OK, maybe in some politicians' eyes but not in the real world) but a resource payment. Suppose that before I accept an email from you I demand that you compute some resource intensive bit of number crunching to prove you're legitimate.

This is called a Proof of Work system and a particular example, originally for email and similar to the requirements for mining in Bitcoin is Hashcash.

The problem here is that the botnet has oodles of computing power available to it -- it's taken over countless machines after all -- so any resource costs are moot.

You can't propose putting a time-limit on answering the question either. If there are automated systems that can present CAPTCHA images to real humans and collect the replies to enable access to a site then you can't imagine it being hard to have a difficult (cost/resource-wise) computation farmed out to compromised fast machines. In addition, a time-limit on computation means you are disenfranchising the little guy: no more Raspberry Pi's sending email?

Messaging Protocols

What about a different beast? From the venerable chat through IRC and the multitude of messaging protocols today, many of which support file transfer, is this the way forward?

The problem there is the multitude part. The vast majority are proprietary protocols, even those based on XMPP (especially those based on XMPP!). There's no agreement, only a determination to "win" the messaging war.

Any alternative must be federated, that is, there must be an agreed standard for interoperability between systems for a messaging protocol to be any use. How many of the messaging protocols interoperate? Almost none, once you've signed up with Example Corp's messaging system you can message other users on Example Corp's system but you have to switch to Global Search Corp's messaging system to message anyone on its system.

It doesn't matter how you implement SMTP internally they all talk the same SMTP to each other. That's the key.

Blockchains

What about using a blockchain for email? This is a bit more interesting. Blockchains are famous from the likes of Bitcoin where transactions and the relative occurrence of those transactions are kept in a ledger that is broadcast to the community. The ledger is lossy in that it doesn't require all records and members of the community come to a consensus about a universal truth.

That lossy state is also the undoing of blockchains in that if enough people agree to a certain truth then their larger vote carries the day.

However, the idea that previous (email) transactions can be stored in a distributed community-held database and new (email) transactions can be verified against what has been done before is a useful possibility.

[1]	Doubtless some HTTP REST query or perhaps SNMP request will return it but that becomes router/model/version specific.

[2]	With the advent of the ICANN-era generic top-level domains we've no hope of making any semantic headway.

[3]	Well, that's the case for me!

Document Actions