[arin-discuss] Status of Investigations

Wed Jan 2 23:03:57 EST 2008

> I notice that you cut out the fact of Mr. Vixie being on the board of a
> spammer while directing MAPS. Doesn't his particpation in spamming seem
> to be a conflict of interest, a "depressing reversal of ethics"?

i believe the company you're referring to here is "whitehat", which was not
a spammer.  it was never listed by spamhaus as such, at any rate.  i'll tell
you how my relationship with whitehat came to be.  rodney joffe, whose idea
it was, came to me (and john levine and some other notable anti-spammers)
and said, "ok guys, you say there's a way to run a responsible mass mailing
company, you've written whitepapers on it, are you ready to put your money
where your mouth is?"  rodney's a very challenging man.  what could i say
except "youbetcherass i am."  i learned a lot about the problems responsible
mass mail companies have -- it's HARD to be as responsible as is required!

but it's achievable, and whitehat during my tenure there achieved it.  i am
no longer affiliated with whitehat, having received no compensation, and
having earned neither royalties nor stock gains while there.  (the company
went through a merger and the new board did not have me on it.  no biggie.)

here's the Spam chapter from Sendmail, Theory and Practice, 2nd edition,
Vixie and Avolio, Digital Press, see http://smtap.al.org/ for ordering info.
with due respect to my co-author, i wrote this chapter, and i believed it
at the time, and i believe it now, and i've lived by it every day since i
wrote, in august 1993, the following fateful words:

+---
| Return-Path: vixie
| Received: by gw.home.vix.com; id AA07212; Mon, 30 Aug 93 12:27:28 -0700
| Message-Id: <9308301927.AA07212 at gw.home.vix.com>
| To: WMILHEIM at PSUGV.PSU.EDU (William Milheim)
| In-Reply-To: Your message of Mon, 30 Aug 93 14:08:42.
|              <930830140842.20200594 at PSUGV.PSU.EDU> 
| Date: Mon, 30 Aug 93 12:27:28 PDT
| From: Paul A Vixie <paul>
| 
| William,
| 
| I'm afraid I may not have expressed myself in adequate detail.  What you did
| was wrong, and it is symptomatic of something hugely evil out on the horizon.
| 
| The Internet is excruciatingly easy to use for mass mailing.  Collecting
| addresses is free; generating mass mailings from them is close to free.  Can
| you fathom the effect these metrics will permit once the Internet comes a
| little bit closer to the mass market?
| 
| All of the folks who now bombard you with junk mail based on your magazine
| subscriptions; who now cause throwaway newspapers to be deposited in your
| driveways; who now call you during dinnertime with a voice-activated
| computers attempting to solicit your vote or your willingness to test-market
| their products --
| 
|                -- all of these people are going to _thrive_ when they
| discover the Internet.  You, with your mass-mailed survey, are paving the
| way for them and helping to _establish_ the answers to the very same
| "etiquette questions" you are trying to research.
| 
| I receive about one of these surveys per the average month, sometimes more.
| You see your survey as an isolated instance and wonder why I complain; I see
| it as one more student sociology experiment by one more dippy professor who
| thinks the Internet is a "fertile ground for socio-environmental research."
| 
| In spite of your intentions, which I knew in advance or at least assumed in
| advance to be "good", the effect of your survey is to hasten the Internet's
| downslide into common-market status.  We must establish, here and every day
| thereafter, that unsolicited mass mailings are _strongly_prohibited_ by the
| Internet code of ethics.
| 
| You can begin this process by posting an apology to the mailing lists you
| targetted in your original post.  I am still waiting to see this done.  I am
| not satisfied that you understand the problem or that the steps you have
| taken so far mitigate in any substantial way the damage you have caused.
| Act now.
| 
| Paul
+---

here's the spam chapter in nroff source code.

---

.ds ID "$Id: main.me,v 1.3 2001/10/04 18:15:06 vixie Exp $"
.ds CT "Unwanted E-mail
.so ../ChapterStart.me
.lp
At the time of the first edition of this book (1992\(en1994), the e-mail world
was a simpler place.  For one thing, the vast majority of the Internet's
population were research and education, or commercial entities who were on the
Internet to support research and education users.
.pp
There were some abuses, in particular a few shady, bottom-feeding companies
who converted the Usenet Map Project's data and the ARPAnet ``whois'' database
into mailing lists and sold them.  Apparently, the Internet's ``operator''
class had built ourselves into an excellent demographic for direct mail \(em
we were well paid professionals, purchasing managers in both our jobs and in
our homes.  And we had conveniently (read: ``naively'') published several
electronic directories so that we could locate each other in times of need.
These abuses, while irritating and insulting, were only the nose of the camel
that was to come in later.
.pp
Most of the early Internet protocols required little or no authentication.
You did not need to prove who you were in order to initiate e-mail transactions
simply because in those early days, everyone who had an Internet connection
was self-selected or peer-selected to have good manners.  If you abused your
ARPAnet connection then the government (who was paying for ARPAnet's operation)
could disconnect you.
.pp
Gradually the government stepped away from Internet funding and the commercial
communications sector took over.  Funding no longer comes from a single source
but rather from all sources (and all destinations).  Without government money
and supervision, the Internet became an organic, evolving, self-governed mesh
of different entities, including the old research and education community but
also including a fast growing commercial data services sector.
.pp
Good manners stopped being a prerequisite to Internet connectivity!  It is
difficult to overstate the significance of that seemingly-small change.
Instead of being connected to a community for some bilaterial purpose, it
became possible to be connected for a completely unilateral purpose.  None of
the people, or the technology, were prepared for this shattering change of
affairs.  Network owners must still cooperate with each other in order that
``connectivity'' exist in a global sense \(em but end users need cooperate
with nobody in order to make use of that connectivity.  And in a disturbing
trend, they increasingly don't.
.pp
As of this second edition (2000\(en2001), a significant and growing fraction
of the e-mail received by the authors is so-called ``spam'' \(em unwanted and
unsolicited nonpersonal ``junk'' mail.  The senders are sometimes paying their
connectivity bills and are often breaking no laws, so they don't (as a class)
see any problem with what they do.  E-mail is a _IT(lot) cheaper than postal
mail, after all, and kills fewer trees.  Those of us who don't actually want
this junk in our mailboxes, are told to ``just hit delete'', or to register
with a global ``opt out list'' which some spammers plan to filter against, or
to request removal from each spammer's database when they first locate (and
spam) us, or to stop worrying since ``this is a one-time mailing.''
.pp
We plan to do none of those things.  Universal connectivity is not a right
\(em no one can be forced to accept traffic they don't want to receive.  We
came into the Internet when the rule was ``never add someone to a mailing list
without her permission,'' and for the authors, that rule still applies.
Anyone who wants connectivity without that rule can have all the connectivity
she wants \(em to other people who also want connectivity without that rule.
We have no wish to prevent connectivity between consenting parties \(em anyone
who wants to be spammed ought to be spammed.  But _IT(we) are not a consenting
party to spamming.  To the extent possible, we will make sure that no spam
enters, exits, or benefits from any network we operate.
.pp
If you're of like mind, and you want to use Sendmail to help enforce your 
rules on the networks and servers you operate, this chapter will show you how.

.sh +1 "Transmission
.lp
If you're an ISP and you want to prevent your resources from being used to
send spam, there are a number of different areas you'll have to watch.  One is
packet level transport.  If a PPP user on your modem pool is able to initiate
outbound SMTP sessions toward the greater Internet, then you can be sure that
many of them, including some spammers, will do so.  Your Sendmail
configuration won't be relevant in that case since the traffic is only passing
through your terminal servers, switches, and routers.  Some ISP's have shut
off outbound SMTP using a firewall in order to force their customers to use
local SMTP relays rather than making direct connections.  It's also possible
with some _SM(ISO-L4) equipment (that means smart switches and smart routers)
to transparently intercept outbound SMTP and direct it to a local SMTP server.
Obviously these activities are beyond the scope of a book about Sendmail \(em
so we'll just start by making the assumption that your local Sendmail server
is somehow required to be involved in the outbound passage of e-mail.
.pp
The big thing you need is on by default: logging.  When someone outside your
network complains to you that one of your users spammed them, it's very
important that you be able to _EX(grep) your _EX(syslog) files to discover
whether the transaction in question really did come through your service.
(It is all to common for the actual headers of e-mail spam to be forged, and
occasionally these forgeries are designed to cast unearned guilt on someone
\(em one of your customers, perhaps \(em rather than merely to avoid capture.)
Another fun thing to do with your _EX(syslog) files is to extract statistics
about how many mail messages each of your users sends in a given period of
time.  Note that this has privacy implications\** and that you should not be
keeping track of who the mail is sent to on any statistical basis.  But it's
useful for you to get a daily report showing how many e-mail messages were
sent by each of your customers.  If someone who usually sends one message per
week sent 50,000 messages last night, you're probably in for a long day.
.(f
\** Note that some privacy advocates strongly oppose forcing outbound mail
to go through an ISP's relays, and we've even heard of _EX(uucp) being used
over _EX(ssh) tunnels as a way to bypass this.  Fortunately, this kind of
trick requires an outside party to cooperate, which won't be true for spam.
.)f
.pp
Beyond logging, your spam transmission problem boils down to a strong AUP\**
.(f
\** Acceptable Use Policy
.)f
which you'll bind to your service agreement and enforce with nonrefundable
deposits.  Spammers are usually quite aware that nobody wants to receive what
they want to send, and if you make it too easy for them to abuse your service,
you can bet that they will do so.  The average professional spammer spends
quite a bit of time identifying providers from which to inject their traffic,
and if your _EX(abuse@) mailbox is not staffed on weekends or if the person
who reads it isn't empowered to suspend and disconnect customers in real time,
word will get around and your service will be repeatedly abused until\** you
make things better.
.(f
\** During this period, you can expect to receive a high volume of
complaints from distant end-victims.
.)f
.pp
The one other thing you can do to make your resources less amenable to the
transmission of spam is to limit the number of recipients each message is
allowed to have.  Very few legitimate messages have more than a handful of
envelope recipients.  If you set the _EX(MaxRecipientsPerMessage) option to
something like 25 or so, only a few of your customers will notice, and
they can be accomodated by moving them to whatever server you use for mailing
lists and other legitimate bulk mail tasks.  A Sendmail consultant or any
competent programmer can make _EX(MaxRecipientsPerMessage) into a per-user
option rather than a global one.  Use the source \(em that's what it's for.

.sh +0 "Relay
.lp
OK, so let's say you've configured your routers and switches and terminal
servers so that no outbound SMTP traffic is allowed except from your own
Sendmail servers, and you've figured out how to watch those servers very
carefully.  The next spam-related thing to watch out for is
_IT(third party relay).  A third party in this case means someone who is
neither an intentional agent for the sender, nor an intentional agent for
the recipient.  The potential third party, your Sendmail server, must not
forward (that is, _IT(relay)) e-mail traffic unless either the sender or
recipient (or both) are customers, partners, or trusted affiliates.  In
other words, don't be an _IT(unintentional) agent for _IT(any) sender nor
_IT(any) recipient.
.pp
Why is this important?  Let's revisit the average spammer's mindset\** and
.(f
\** Distasteful though it is.
.)f
note that they already know that nobody wants their spam and that any
leased-line or colocated Internet connection they get won't last long if
they use it to initiate spam.  So in order to deliver their unwanted spew
as far from their nest as possible, they use unmonitored injection points
such as the modem pools mentioned earlier.  However, these modem pools tend
to have relatively low speeds.  If the goal is to send a 4Kbyte message to
500,000 victims, that's ~2Gbytes of traffic, which at 15Kbytes/second (ISDN)
will take about 37 hours to transmit.  It's unlikely that the owner of that
modem pool will endure 37 hours worth of complaints without taking some kind
of action.
.pp
So, enter the third party relay.  If a spammer knows of 25 mail servers which
are willing to accept mail ``from the outside'' even though the destination will
also be ``to the outside,'' then instead of initiating 500,000 SMTP transactions
to 500,000 different victim servers they can initiate 5,000 SMTP transactions
to 25 relays (that's 200 sessions per relay) with 100 recipients per SMTP
transaction.  This is only 20Mbytes of outbound traffic and at ISDN speeds
can be finished in about 40 minutes.  Obviously these are untuned numbers, and
a motivated professional spammer would invest a lot of effort in finding out
how many recipients each transaction should have, and how many transactions
each relay should have, and so forth, in order to postpone detection while
also maximizing throughput.
.pp
A professional spammer will also spend a lot of time searching for new relays,
to try to spread their workload out as thinly as possible.  We estimate that
there are tens of thousands of open relays at any given moment.  Every time
one is closed down due to spam complaints, another is installed somewhere
else.  Modern Sendmail is unwilling to relay for third parties by default, but
for much of its history this was not the case.  A lot of those older servers
are still out there, and many of them run unattended or are attended only by
nontechnical personnel or by people who do not understand any language in
which you can issue complaints.  It's a huge problem and we urge you to avoid
making it worse: ensure that your Sendmail servers will only relay a piece of
mail if it was sent by, or destined for, one of your customers, agents, or
affiliates.
.pp
Because Sendmail's defaults are correct in modern versions, we won't go into
great detail about how to prevent third party relay from occuring.  You'll be
adding all your local domain names to _EX(/etc/mail/relay_domains) or some
equivilent file, and Sendmail will search that file during the _EX(check_rcpt)
ruleset which is the earliest moment when both the envelope sender, initiator
IP address and domain name, and envelope recipient are all known.  Mail which
is coming from a trusted source or going to a trusted destination is considered
OK.  Everything else resolves through the _EX(error) mailer, as in
.E+
R$*            $#error $@ 5.7.1 $: "550 Relaying denied"
.E-
which causes the _EX(RCPT) verb to fail during the _EX(SMTP) transaction, thus
informing the sender that what they're trying to do won't work.  Let them move
along and find a more willing accomplice for their evil deeds.

.sh +0 "Reception
.lp
Inbound spam is the most visible part of the overall spam problem, simply
because of the time it takes to get it out of our own personal inboxes.  Back
in December 2000, a personal friend whose domain has existed since the mid
1980's and is therefore listed on every ``millions of guaranteed fresh e-mail
addresses for only $6.95'' _SM(CDROM) ever published, analyzed his _EX(syslog)
and made the following report:
.(q
Here are some interesting stats for you... You can see how bizzare
it really is.  In 289,604 recipients, over 1843 spams, there are
a maximum of 455 going to one recipient.  And a bogus one at that!
.)q
That was quite a while ago, before his problem got so bad that he could no
longer host his own inbox on a T1 (1.5Mbyte/second) line.
.pp
Inbound spam doesn't just hit the mailboxes of network and system operators,
however.  We're not the real target \(em our customers are.  A few successful
spam runs per night can consume tens of megabits of link capacity, and tens of
gigabytes of mailbox capacity.  This can lead to service level complaints \(em
``Why is the network so slow?'' and ``What do you mean my mailbox is over
quota?''  being two favourites.  Most providers are forced by spam to
overprovision their links and their mail servers, in addition to whatever
costs they incur in filtering the input stream and cleaning up whatever the
filters miss.
.pp
So, how can Sendmail help?  Well, again, the modern defaults are pretty good.
Quite a bit of spam has something wrong with its headers \(em for example, the
_EX(From:) domain won't exist.  Sendmail's defaults throw a lot of this kind
of trash away with only a _EX(syslog) entry to mark its passing.  However, the
spammers have a lot more time to work on their half of this problem (sending)
than we do (filtering) and so any filters related to content (headers or body)
are inherently weak.
.pp
One exception to this is Rhyolite _SM(DCC), the Distributed Checksum
Clearinghouse\**.  _SM(DCC) is a set of freely available open source tools
which bolt onto Sendmail via the lately introduced _EX(milter) interface and
allow a Sendmail server to detect similarities between mail received ``here,
now'' and mail received ``elsewhere, recently.''  Depending on how you
configure it, _SM(DCC) can merely mark suspected spam with an _EX(X-DCC:)
header, or drop it altogether, or simply requeue it so that Sendmail can
do another _EX(DCC) lookup in 30 minutes to see if the degree of distributed
similarity goes up during that period.
.(f
\** See _EX(http://www.rhyolite.com/anti-spam/dcc/) for details.
.)f
.pp
The _SM(DCC) tools are available completely free of charge, and include both
the client and server parts so it's possible to set up your own _SM(DCC) cloud
without necessarily connecting it to _SM(DCC) clouds being run by others \(em
though we've found that the more Sendmail servers who participate in a given
_SM(DCC) cloud, the more spam can be detected.  Note that in its most
primative form, _SM(DCC) can be used to detect and even block outbound mail as
well, so it's not a purely inbound tool.  Also note, though, that _SM(DCC)
detects _IT(bulk) e-mail rather than _IT(spam).  You have to tell _SM(DCC)'s
``whitelist'' about all known sources of legitimate bulk e-mail, such as
mailing lists, customer newsletters, and so on.  This is not an unrealistic
cost when compared to _SM(DCC)'s observed benefits.
.pp
Ultimately, the best known tool for keeping inbound spam out of your network
is to filter it out based on its source rather than its content.  A number of
bureaus of concerned Internet citizens now exist who publish lists of known
spam sources, and the publication format was designed for Sendmail to process
it in real time.  Sometimes this format is called by the name _SM(RBL)\(tm,
which stands for Realtime Blackhole List.  _SM(RBL)\(tm is a service mark of
Mail Abuse Prevention System LLC, who pioneered source-based spam filtering
and who invented the format Sendmail now uses for accessing all
_SM(RBL)\(tm-like lists.  To teach your Sendmail to check mail sources
against a published blackhole list such as the MAPS RBL, add something like
.E. "FEATURE(`dnsbl', `blackholes.mail-abuse.org')"
to your _EX(*.mc) file and remake your _EX(*.cf) file.  Note that some lists
are subscription-based and you should check with their publishers before you
change your Sendmail configuration to depend on them.  It's possible to
check your mail sources against more than one blackhole list, so for example
you might reject mail from known spam sources using one list, and from known
open relays using some other list.  There can be noticable loss of performance 
when you check too many lists, though, since each one will add a DNS lookup
to every inbound mail message you receive.  Some lists are available in bulk
form in order to limit this performance loss \(em again, check with the list's
publisher to find out what's possible before you configure your Sendmail.

.sh +0 "Definition
.lp
But what exactly _IT(is) e-mail spam, anyway?  You'll need a consistent
definition that you use classify both inbound and outbound traffic.  One of
the questions which is still very much open in the minds of many folks is
whether an e-mail message must be provably ``bulk'' before it can be
considered spam.  If you adhere to the ``bulk'' standard, you won't be able to
act on incoming complaints until you have more than one complaint about the
same outbound e-mail message, and you'll be in infinite regress trying to
determine what ``the same'' means.
.pp
Far better, in the authors' view, to let a message be provably and
deterministically ``spam'' or ``not spam'' based entirely on knowledge gained
from a single complainer.  The standard for ``spamness'' which most embodies
this principle was found at _EX(http://mail-abuse.org/standard.html) and is
reproduced here:
.(q
STANDARD:
.sp 0.5v
An electronic message is ``spam'' _SM(IF): (1) the recipient's personal
identity and context are irrelevant because the message is equally applicable
to many other potential recipients; _SM(AND) (2) the recipient has not
verifiably granted deliberate, explicit, and still-revocable permission for it
to be sent; _SM(AND) (3) the transmission and reception of the message appears
to the recipient to give a disproportionate benefit to the sender.
.sp 0.5v
DISCUSSION:
.sp 0.5v
(i) Trivial or mechanised personalization such as ``Dear Mr. Jones, we see
that you are the holder of the _SM(JONES.COM) domain'' does not make the
personal identity of the recipient relevant in any way.
.sp 0.5v
(ii) Failing to click the ``do not send me marketing literature by e-mail''
button in a web sign-up form does not convey explicit permission.  Only when
the default result is ``no followup e-mail'' _SM(AND) the inbox impact is
clearly stated before any action which changes this result, can permission of
this kind be conveyed.
.sp 0.5v
(iii) The appearance of disproportionate benefit to the sender, and the
relevancy of the recipient's specific personal identity, are authoritatively
determined by the recipient, and is not subject to argument or
reinterpretation by the sender.
.sp 0.5v
(iv) Non-personal e-mail always places a disproportionate cost burden on the
recipient, and is considered to disproportionately benefit the sender unless
it was verifiably solicited or by the recipient's willing exception.
.sp 0.5v
(v) A message need not be offensive or commercial in order to fit the
definition of ``spam.''  Content is irrelevent except to the extent necessary
to determine personal applicability, consent, and benefit.
.)q
.pp
We've heard of arguments that such a standard places too much power in the
hands of recipients.  In our view, recipients are paying the majority of the
cost of e-mail transport, and thus ought to have the strongest voice in what's
sent (or not) to them.  Besides which, such an argument presumes that there's
a piece of mail that a sender isn't certain was solicited.  Our advice is:
_IT(don't send it then!).

---