[arin-discuss] Status of Investigations
Paul Vixie
paul at vix.com
Wed Jan 2 23:03:57 EST 2008
- Previous message: [arin-discuss] Status of Investigations
- Next message: [arin-discuss] Status of Investigations
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> I notice that you cut out the fact of Mr. Vixie being on the board of a > spammer while directing MAPS. Doesn't his particpation in spamming seem > to be a conflict of interest, a "depressing reversal of ethics"? i believe the company you're referring to here is "whitehat", which was not a spammer. it was never listed by spamhaus as such, at any rate. i'll tell you how my relationship with whitehat came to be. rodney joffe, whose idea it was, came to me (and john levine and some other notable anti-spammers) and said, "ok guys, you say there's a way to run a responsible mass mailing company, you've written whitepapers on it, are you ready to put your money where your mouth is?" rodney's a very challenging man. what could i say except "youbetcherass i am." i learned a lot about the problems responsible mass mail companies have -- it's HARD to be as responsible as is required! but it's achievable, and whitehat during my tenure there achieved it. i am no longer affiliated with whitehat, having received no compensation, and having earned neither royalties nor stock gains while there. (the company went through a merger and the new board did not have me on it. no biggie.) here's the Spam chapter from Sendmail, Theory and Practice, 2nd edition, Vixie and Avolio, Digital Press, see http://smtap.al.org/ for ordering info. with due respect to my co-author, i wrote this chapter, and i believed it at the time, and i believe it now, and i've lived by it every day since i wrote, in august 1993, the following fateful words: +--- | Return-Path: vixie | Received: by gw.home.vix.com; id AA07212; Mon, 30 Aug 93 12:27:28 -0700 | Message-Id: <9308301927.AA07212 at gw.home.vix.com> | To: WMILHEIM at PSUGV.PSU.EDU (William Milheim) | In-Reply-To: Your message of Mon, 30 Aug 93 14:08:42. | <930830140842.20200594 at PSUGV.PSU.EDU> | Date: Mon, 30 Aug 93 12:27:28 PDT | From: Paul A Vixie <paul> | | William, | | I'm afraid I may not have expressed myself in adequate detail. What you did | was wrong, and it is symptomatic of something hugely evil out on the horizon. | | The Internet is excruciatingly easy to use for mass mailing. Collecting | addresses is free; generating mass mailings from them is close to free. Can | you fathom the effect these metrics will permit once the Internet comes a | little bit closer to the mass market? | | All of the folks who now bombard you with junk mail based on your magazine | subscriptions; who now cause throwaway newspapers to be deposited in your | driveways; who now call you during dinnertime with a voice-activated | computers attempting to solicit your vote or your willingness to test-market | their products -- | | -- all of these people are going to _thrive_ when they | discover the Internet. You, with your mass-mailed survey, are paving the | way for them and helping to _establish_ the answers to the very same | "etiquette questions" you are trying to research. | | I receive about one of these surveys per the average month, sometimes more. | You see your survey as an isolated instance and wonder why I complain; I see | it as one more student sociology experiment by one more dippy professor who | thinks the Internet is a "fertile ground for socio-environmental research." | | In spite of your intentions, which I knew in advance or at least assumed in | advance to be "good", the effect of your survey is to hasten the Internet's | downslide into common-market status. We must establish, here and every day | thereafter, that unsolicited mass mailings are _strongly_prohibited_ by the | Internet code of ethics. | | You can begin this process by posting an apology to the mailing lists you | targetted in your original post. I am still waiting to see this done. I am | not satisfied that you understand the problem or that the steps you have | taken so far mitigate in any substantial way the damage you have caused. | Act now. | | Paul +--- here's the spam chapter in nroff source code. --- .ds ID "$Id: main.me,v 1.3 2001/10/04 18:15:06 vixie Exp $" .ds CT "Unwanted E-mail .so ../ChapterStart.me .lp At the time of the first edition of this book (1992\(en1994), the e-mail world was a simpler place. For one thing, the vast majority of the Internet's population were research and education, or commercial entities who were on the Internet to support research and education users. .pp There were some abuses, in particular a few shady, bottom-feeding companies who converted the Usenet Map Project's data and the ARPAnet ``whois'' database into mailing lists and sold them. Apparently, the Internet's ``operator'' class had built ourselves into an excellent demographic for direct mail \(em we were well paid professionals, purchasing managers in both our jobs and in our homes. And we had conveniently (read: ``naively'') published several electronic directories so that we could locate each other in times of need. These abuses, while irritating and insulting, were only the nose of the camel that was to come in later. .pp Most of the early Internet protocols required little or no authentication. You did not need to prove who you were in order to initiate e-mail transactions simply because in those early days, everyone who had an Internet connection was self-selected or peer-selected to have good manners. If you abused your ARPAnet connection then the government (who was paying for ARPAnet's operation) could disconnect you. .pp Gradually the government stepped away from Internet funding and the commercial communications sector took over. Funding no longer comes from a single source but rather from all sources (and all destinations). Without government money and supervision, the Internet became an organic, evolving, self-governed mesh of different entities, including the old research and education community but also including a fast growing commercial data services sector. .pp Good manners stopped being a prerequisite to Internet connectivity! It is difficult to overstate the significance of that seemingly-small change. Instead of being connected to a community for some bilaterial purpose, it became possible to be connected for a completely unilateral purpose. None of the people, or the technology, were prepared for this shattering change of affairs. Network owners must still cooperate with each other in order that ``connectivity'' exist in a global sense \(em but end users need cooperate with nobody in order to make use of that connectivity. And in a disturbing trend, they increasingly don't. .pp As of this second edition (2000\(en2001), a significant and growing fraction of the e-mail received by the authors is so-called ``spam'' \(em unwanted and unsolicited nonpersonal ``junk'' mail. The senders are sometimes paying their connectivity bills and are often breaking no laws, so they don't (as a class) see any problem with what they do. E-mail is a _IT(lot) cheaper than postal mail, after all, and kills fewer trees. Those of us who don't actually want this junk in our mailboxes, are told to ``just hit delete'', or to register with a global ``opt out list'' which some spammers plan to filter against, or to request removal from each spammer's database when they first locate (and spam) us, or to stop worrying since ``this is a one-time mailing.'' .pp We plan to do none of those things. Universal connectivity is not a right \(em no one can be forced to accept traffic they don't want to receive. We came into the Internet when the rule was ``never add someone to a mailing list without her permission,'' and for the authors, that rule still applies. Anyone who wants connectivity without that rule can have all the connectivity she wants \(em to other people who also want connectivity without that rule. We have no wish to prevent connectivity between consenting parties \(em anyone who wants to be spammed ought to be spammed. But _IT(we) are not a consenting party to spamming. To the extent possible, we will make sure that no spam enters, exits, or benefits from any network we operate. .pp If you're of like mind, and you want to use Sendmail to help enforce your rules on the networks and servers you operate, this chapter will show you how. .sh +1 "Transmission .lp If you're an ISP and you want to prevent your resources from being used to send spam, there are a number of different areas you'll have to watch. One is packet level transport. If a PPP user on your modem pool is able to initiate outbound SMTP sessions toward the greater Internet, then you can be sure that many of them, including some spammers, will do so. Your Sendmail configuration won't be relevant in that case since the traffic is only passing through your terminal servers, switches, and routers. Some ISP's have shut off outbound SMTP using a firewall in order to force their customers to use local SMTP relays rather than making direct connections. It's also possible with some _SM(ISO-L4) equipment (that means smart switches and smart routers) to transparently intercept outbound SMTP and direct it to a local SMTP server. Obviously these activities are beyond the scope of a book about Sendmail \(em so we'll just start by making the assumption that your local Sendmail server is somehow required to be involved in the outbound passage of e-mail. .pp The big thing you need is on by default: logging. When someone outside your network complains to you that one of your users spammed them, it's very important that you be able to _EX(grep) your _EX(syslog) files to discover whether the transaction in question really did come through your service. (It is all to common for the actual headers of e-mail spam to be forged, and occasionally these forgeries are designed to cast unearned guilt on someone \(em one of your customers, perhaps \(em rather than merely to avoid capture.) Another fun thing to do with your _EX(syslog) files is to extract statistics about how many mail messages each of your users sends in a given period of time. Note that this has privacy implications\** and that you should not be keeping track of who the mail is sent to on any statistical basis. But it's useful for you to get a daily report showing how many e-mail messages were sent by each of your customers. If someone who usually sends one message per week sent 50,000 messages last night, you're probably in for a long day. .(f \** Note that some privacy advocates strongly oppose forcing outbound mail to go through an ISP's relays, and we've even heard of _EX(uucp) being used over _EX(ssh) tunnels as a way to bypass this. Fortunately, this kind of trick requires an outside party to cooperate, which won't be true for spam. .)f .pp Beyond logging, your spam transmission problem boils down to a strong AUP\** .(f \** Acceptable Use Policy .)f which you'll bind to your service agreement and enforce with nonrefundable deposits. Spammers are usually quite aware that nobody wants to receive what they want to send, and if you make it too easy for them to abuse your service, you can bet that they will do so. The average professional spammer spends quite a bit of time identifying providers from which to inject their traffic, and if your _EX(abuse@) mailbox is not staffed on weekends or if the person who reads it isn't empowered to suspend and disconnect customers in real time, word will get around and your service will be repeatedly abused until\** you make things better. .(f \** During this period, you can expect to receive a high volume of complaints from distant end-victims. .)f .pp The one other thing you can do to make your resources less amenable to the transmission of spam is to limit the number of recipients each message is allowed to have. Very few legitimate messages have more than a handful of envelope recipients. If you set the _EX(MaxRecipientsPerMessage) option to something like 25 or so, only a few of your customers will notice, and they can be accomodated by moving them to whatever server you use for mailing lists and other legitimate bulk mail tasks. A Sendmail consultant or any competent programmer can make _EX(MaxRecipientsPerMessage) into a per-user option rather than a global one. Use the source \(em that's what it's for. .sh +0 "Relay .lp OK, so let's say you've configured your routers and switches and terminal servers so that no outbound SMTP traffic is allowed except from your own Sendmail servers, and you've figured out how to watch those servers very carefully. The next spam-related thing to watch out for is _IT(third party relay). A third party in this case means someone who is neither an intentional agent for the sender, nor an intentional agent for the recipient. The potential third party, your Sendmail server, must not forward (that is, _IT(relay)) e-mail traffic unless either the sender or recipient (or both) are customers, partners, or trusted affiliates. In other words, don't be an _IT(unintentional) agent for _IT(any) sender nor _IT(any) recipient. .pp Why is this important? Let's revisit the average spammer's mindset\** and .(f \** Distasteful though it is. .)f note that they already know that nobody wants their spam and that any leased-line or colocated Internet connection they get won't last long if they use it to initiate spam. So in order to deliver their unwanted spew as far from their nest as possible, they use unmonitored injection points such as the modem pools mentioned earlier. However, these modem pools tend to have relatively low speeds. If the goal is to send a 4Kbyte message to 500,000 victims, that's ~2Gbytes of traffic, which at 15Kbytes/second (ISDN) will take about 37 hours to transmit. It's unlikely that the owner of that modem pool will endure 37 hours worth of complaints without taking some kind of action. .pp So, enter the third party relay. If a spammer knows of 25 mail servers which are willing to accept mail ``from the outside'' even though the destination will also be ``to the outside,'' then instead of initiating 500,000 SMTP transactions to 500,000 different victim servers they can initiate 5,000 SMTP transactions to 25 relays (that's 200 sessions per relay) with 100 recipients per SMTP transaction. This is only 20Mbytes of outbound traffic and at ISDN speeds can be finished in about 40 minutes. Obviously these are untuned numbers, and a motivated professional spammer would invest a lot of effort in finding out how many recipients each transaction should have, and how many transactions each relay should have, and so forth, in order to postpone detection while also maximizing throughput. .pp A professional spammer will also spend a lot of time searching for new relays, to try to spread their workload out as thinly as possible. We estimate that there are tens of thousands of open relays at any given moment. Every time one is closed down due to spam complaints, another is installed somewhere else. Modern Sendmail is unwilling to relay for third parties by default, but for much of its history this was not the case. A lot of those older servers are still out there, and many of them run unattended or are attended only by nontechnical personnel or by people who do not understand any language in which you can issue complaints. It's a huge problem and we urge you to avoid making it worse: ensure that your Sendmail servers will only relay a piece of mail if it was sent by, or destined for, one of your customers, agents, or affiliates. .pp Because Sendmail's defaults are correct in modern versions, we won't go into great detail about how to prevent third party relay from occuring. You'll be adding all your local domain names to _EX(/etc/mail/relay_domains) or some equivilent file, and Sendmail will search that file during the _EX(check_rcpt) ruleset which is the earliest moment when both the envelope sender, initiator IP address and domain name, and envelope recipient are all known. Mail which is coming from a trusted source or going to a trusted destination is considered OK. Everything else resolves through the _EX(error) mailer, as in .E+ R$* $#error $@ 5.7.1 $: "550 Relaying denied" .E- which causes the _EX(RCPT) verb to fail during the _EX(SMTP) transaction, thus informing the sender that what they're trying to do won't work. Let them move along and find a more willing accomplice for their evil deeds. .sh +0 "Reception .lp Inbound spam is the most visible part of the overall spam problem, simply because of the time it takes to get it out of our own personal inboxes. Back in December 2000, a personal friend whose domain has existed since the mid 1980's and is therefore listed on every ``millions of guaranteed fresh e-mail addresses for only $6.95'' _SM(CDROM) ever published, analyzed his _EX(syslog) and made the following report: .(q Here are some interesting stats for you... You can see how bizzare it really is. In 289,604 recipients, over 1843 spams, there are a maximum of 455 going to one recipient. And a bogus one at that! .)q That was quite a while ago, before his problem got so bad that he could no longer host his own inbox on a T1 (1.5Mbyte/second) line. .pp Inbound spam doesn't just hit the mailboxes of network and system operators, however. We're not the real target \(em our customers are. A few successful spam runs per night can consume tens of megabits of link capacity, and tens of gigabytes of mailbox capacity. This can lead to service level complaints \(em ``Why is the network so slow?'' and ``What do you mean my mailbox is over quota?'' being two favourites. Most providers are forced by spam to overprovision their links and their mail servers, in addition to whatever costs they incur in filtering the input stream and cleaning up whatever the filters miss. .pp So, how can Sendmail help? Well, again, the modern defaults are pretty good. Quite a bit of spam has something wrong with its headers \(em for example, the _EX(From:) domain won't exist. Sendmail's defaults throw a lot of this kind of trash away with only a _EX(syslog) entry to mark its passing. However, the spammers have a lot more time to work on their half of this problem (sending) than we do (filtering) and so any filters related to content (headers or body) are inherently weak. .pp One exception to this is Rhyolite _SM(DCC), the Distributed Checksum Clearinghouse\**. _SM(DCC) is a set of freely available open source tools which bolt onto Sendmail via the lately introduced _EX(milter) interface and allow a Sendmail server to detect similarities between mail received ``here, now'' and mail received ``elsewhere, recently.'' Depending on how you configure it, _SM(DCC) can merely mark suspected spam with an _EX(X-DCC:) header, or drop it altogether, or simply requeue it so that Sendmail can do another _EX(DCC) lookup in 30 minutes to see if the degree of distributed similarity goes up during that period. .(f \** See _EX(http://www.rhyolite.com/anti-spam/dcc/) for details. .)f .pp The _SM(DCC) tools are available completely free of charge, and include both the client and server parts so it's possible to set up your own _SM(DCC) cloud without necessarily connecting it to _SM(DCC) clouds being run by others \(em though we've found that the more Sendmail servers who participate in a given _SM(DCC) cloud, the more spam can be detected. Note that in its most primative form, _SM(DCC) can be used to detect and even block outbound mail as well, so it's not a purely inbound tool. Also note, though, that _SM(DCC) detects _IT(bulk) e-mail rather than _IT(spam). You have to tell _SM(DCC)'s ``whitelist'' about all known sources of legitimate bulk e-mail, such as mailing lists, customer newsletters, and so on. This is not an unrealistic cost when compared to _SM(DCC)'s observed benefits. .pp Ultimately, the best known tool for keeping inbound spam out of your network is to filter it out based on its source rather than its content. A number of bureaus of concerned Internet citizens now exist who publish lists of known spam sources, and the publication format was designed for Sendmail to process it in real time. Sometimes this format is called by the name _SM(RBL)\(tm, which stands for Realtime Blackhole List. _SM(RBL)\(tm is a service mark of Mail Abuse Prevention System LLC, who pioneered source-based spam filtering and who invented the format Sendmail now uses for accessing all _SM(RBL)\(tm-like lists. To teach your Sendmail to check mail sources against a published blackhole list such as the MAPS RBL, add something like .E. "FEATURE(`dnsbl', `blackholes.mail-abuse.org')" to your _EX(*.mc) file and remake your _EX(*.cf) file. Note that some lists are subscription-based and you should check with their publishers before you change your Sendmail configuration to depend on them. It's possible to check your mail sources against more than one blackhole list, so for example you might reject mail from known spam sources using one list, and from known open relays using some other list. There can be noticable loss of performance when you check too many lists, though, since each one will add a DNS lookup to every inbound mail message you receive. Some lists are available in bulk form in order to limit this performance loss \(em again, check with the list's publisher to find out what's possible before you configure your Sendmail. .sh +0 "Definition .lp But what exactly _IT(is) e-mail spam, anyway? You'll need a consistent definition that you use classify both inbound and outbound traffic. One of the questions which is still very much open in the minds of many folks is whether an e-mail message must be provably ``bulk'' before it can be considered spam. If you adhere to the ``bulk'' standard, you won't be able to act on incoming complaints until you have more than one complaint about the same outbound e-mail message, and you'll be in infinite regress trying to determine what ``the same'' means. .pp Far better, in the authors' view, to let a message be provably and deterministically ``spam'' or ``not spam'' based entirely on knowledge gained from a single complainer. The standard for ``spamness'' which most embodies this principle was found at _EX(http://mail-abuse.org/standard.html) and is reproduced here: .(q STANDARD: .sp 0.5v An electronic message is ``spam'' _SM(IF): (1) the recipient's personal identity and context are irrelevant because the message is equally applicable to many other potential recipients; _SM(AND) (2) the recipient has not verifiably granted deliberate, explicit, and still-revocable permission for it to be sent; _SM(AND) (3) the transmission and reception of the message appears to the recipient to give a disproportionate benefit to the sender. .sp 0.5v DISCUSSION: .sp 0.5v (i) Trivial or mechanised personalization such as ``Dear Mr. Jones, we see that you are the holder of the _SM(JONES.COM) domain'' does not make the personal identity of the recipient relevant in any way. .sp 0.5v (ii) Failing to click the ``do not send me marketing literature by e-mail'' button in a web sign-up form does not convey explicit permission. Only when the default result is ``no followup e-mail'' _SM(AND) the inbox impact is clearly stated before any action which changes this result, can permission of this kind be conveyed. .sp 0.5v (iii) The appearance of disproportionate benefit to the sender, and the relevancy of the recipient's specific personal identity, are authoritatively determined by the recipient, and is not subject to argument or reinterpretation by the sender. .sp 0.5v (iv) Non-personal e-mail always places a disproportionate cost burden on the recipient, and is considered to disproportionately benefit the sender unless it was verifiably solicited or by the recipient's willing exception. .sp 0.5v (v) A message need not be offensive or commercial in order to fit the definition of ``spam.'' Content is irrelevent except to the extent necessary to determine personal applicability, consent, and benefit. .)q .pp We've heard of arguments that such a standard places too much power in the hands of recipients. In our view, recipients are paying the majority of the cost of e-mail transport, and thus ought to have the strongest voice in what's sent (or not) to them. Besides which, such an argument presumes that there's a piece of mail that a sender isn't certain was solicited. Our advice is: _IT(don't send it then!). ---
- Previous message: [arin-discuss] Status of Investigations
- Next message: [arin-discuss] Status of Investigations
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the ARIN-discuss mailing list