[arin-ppml] A modest proposal for IPv6 address allocations

William Herrin bill at herrin.us
Mon Jun 1 00:25:45 EDT 2009

On Sun, May 31, 2009 at 1:19 AM, James Hess<mysidia at gmail.com> wrote:
> Why allocate  /32s  from a pool reserved for  /32s?
> It would perhaps make more sense, as a matter of policy, that a
> special allocation strategy be utilized,  that the  /48s,  /32s, and
> /24s  are allocated from just one pool.

Hi James,

You'd think so, but no.

Here's the problem: at the BGP level you can't tell the difference
between a multihomed customer route and a traffic engineering route. A
route is a route is a route.

Now, if you filter distant routes from multihomed entities then you
get a rep for being unreliable since you sometimes can't reach those
folks while your competitors can. Hence you don't filter the routes,
multihomed entity or TE. The net result is that your gains from the
multihomed entity announcing only one primary route instead of two or
three are more than erased by the loss to unfilterable TE routes.

If you allocate /32's from a block that's only /32's and /48's from a
block that's only /48's then anything longer than a /32 in the /32
block is traffic engineering. There are no customer routes mixed in.
As a result, the traffic engineering is filterable in a practical way,
which means you can keep the local route count due to traffic
engineering down to a sensible level instead of having to propagate
everybody's TE routes throughout the world.

Look at the top entry on last Friday's CIDR report:

ASnum    NetsNow NetsAggr  NetGain   % Gain   Description
AS6389      4292      338     3954    92.1%   BELLSOUTH-NET-BLK -
                                              BellSouth.net Inc.

Bell south is announcing 4,292 routes into the BGP table which minus
the traffic engineering could be compressed into something close to
338 routes. But nobody can do the compression because it isn't
possible for someone other than Bell South to determine which of the
extra 3,954 routes are TE and which are multihomed customers.

If we could tell with certainty that the extra 3,954 routes were TE
then most nodes could just drop them with essentially no ill effects
to the network. Which according to
http://bill.herrin.us/network/bgpcost.html offers an annual systemic
cost savings of more than $20M. Just from this one example.

On Sun, May 31, 2009 at 8:49 PM, Joe Maimon<jmaimon at chl.com> wrote:
> Renumbering is something everyone should desire to avoid, regardless of how
> easy it is, and I would oppose any policy promoting that activity, so please
> clarify whether you were intending for renumbering to occur or not.

Hi Joe,

Here's the skinny on renumbering: it royally sucks. It isn't just the
direct cost. It takes hours after a renumbering event to get two-nines
recovery of connectivity and it takes months to get to five-nines.
Exceptional system architects can maintain connectivity with overlap
between the old and new numbers and clever policy-routing, but how
many admins are exceptional?

The reasons why renumbering is such a problem are many and varied but
you can understand 90% of it with two concepts, both of which fail
horribly during renumbering: gethostbyname and dns pinning.

Gethostbyname is by far the most common way code monkeys translate DNS
names into IP addresses. Even if you don't use it directly, it lurks
under the hood of most of the APIs you do use. It returns no
information about duration of validity, so nearly every software
developer who uses gethostbyname assumes that the mapping between name
and IP address is valid indefinitely, for whatever duration which the
developer cares to keep track of it. And 20+ years of applications
make that assumption.

DNS Pinning closes a general web browser vulnerability in client-side
scripting languages like javascript which could be made to scan then
interior of a firewalled network by rapidly changing a DNS to IP
address mapping. Greatly simplifying, after reaching a web site, DNS
Pinning prevents the web browser from trying any other IP address for
the site until after the browser is stopped and restarted. This means
that folks using your web server when you renumber won't be able to
reach it again until they close and restart their browsers.

This having been said, with adequate planning and adequate expertise,
renumbering does actually work for single-homed systems transitioning
from one ISP to the next. Where it doesn't work is for multihomed
systems recovering from a link failure. Multihoming recovery needs to
happen in seconds or minutes. Taking hours to get to 2-nines recovery
is unacceptable.

Unfortunately, the renumbering problem doesn't exist in a vacuum. The
only way to avoid renumbering is to always announce at least one route
of your own into the IPv6 BGP DFZ table. As previously mentioned, that
costs at least $10k per year. Absent an accounting system capable of
billing you and distributing the money to the 30k orgs whose resources
you're consuming just by announcing the route, that's more than $10k
of *other people's money.* People don't like it when you spend their
money. People are funny that way.

So, renumbering's role in policy ends up being a balance between three factors:

1. The cost of routes in the DFZ
2. Whether the given activity is practical without announcing a route
into the DFZ
3. Where it is practical, the cost of renumbering instead of
announcing a route into the DFZ

Answering your question directly, my plan assumes that single-homed
systems will renumber when they change ISPs, multihomed systems will
not and by the time you grow into the largest /24 allocation, you're
expected to release your first /48.

The rationale behind single and multihomed systems should be fairly
obvious. I would only add to it that in my ever so humble opinion
there is no entity in ARIN territory whose renumbering cost exceeds
$10k/year yet can't afford to add a DSL line and a tunnel provider.

The reasons for releasing the /48 but keeping the /32 before getting
the /24 are more subtle: How efficiently did you use that /32?
Efficiently enough that you're willing to renumber out of the /48
rather than try to squeeze more efficiency out of the /32.

> So please put a nail into the mantra of registries not being involved in
> routing.

Last year the IRTF RRG made an effort to catalog every way we could
conceive of routing in some future Internet with any prayer of it
working. See http://tools.ietf.org/html/draft-irtf-rrg-recommendation-02
. As you read that draft, one thing that becomes real obvious real
fast is that addressing is routing is addressing.

Registries probably shouldn't try to micro-manage the routing system,
but the nature of the beast is that addressing policy sets the big
picture in which the individual routing decisions are made.

On Sun, May 31, 2009 at 2:42 PM, Kevin Loch<kloch at kl.net> wrote:
> This is called "sparse allocation" method and is what APNIC uses today.  It
> was one of the the justifications for giving each RIR blocks of /12
> instead of /23.  I would like to know why ARIN is not using this method.

Hi Kevin,

Sparse is a farce. It was a clever enough idea, but it ferociously
fragments the pool and it wipes out all the other possible levers for
restraining the table size in the vain hope that each AS will announce
one and only one route. We'd be well rid of it.

Bill Herrin

William D. Herrin ................ herrin at dirtside.com  bill at herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004

More information about the ARIN-PPML mailing list