[arin-ppml] Number of routes, IPv6 vrs IPv4.

Sun May 31 12:11:52 EDT 2009

In all the talk about how liberal we want to be with giving out
IPv6 number resources there seems to be little concern for the
technology involved.  I want to go down that road a little bit.

I want to get off the table the notion that these addresses are not
going to be used on the Internet.  While I agree there are applications
where folks need globally unique addresses for non-Internet connected
networks the fact of the matter is virtually all (I'm sure well
over 99%) of ARIN allocated space appears on the Internet at some
point.  While I think there are good reasons to look at policy in
these areas and have better mechanisms for non-Internet usages, the
core of what ARIN does is provide addresses to the Internet, so I
think that is where we should focus.

Now, to the bit of technology.  IPv6 addresses are larger than IPv4
addresses.  Specifically, they are four times larger.  Routers have
to hold the routing information in memory, and so this means an
equivalent number of IPv6 routes will take four times the memory.

You might say, so what, memory is cheap these days.  That can be
true, but in many cases we're not talking about PC DRAM.  Routers
and switches uses many sorts of specialized memory in order to make
lookups happen fast enough to route packets at wire speed.  This
specialized memory is much more expensive.

Every platform is different.  I can't provide some general rule,
and indeed finding documentation on some of the inner workings is
difficult unless you are a major purchaser of said equipment and
get the behind the scenes tour from a vendor.  I will point out one
available example that I think is fairly "middle of the road" in
terms of the issue.

There as recently been a lot of press about the Cisco 6500 "Sup720-3B"
platform.  This platform uses TCAM
(http://en.wikipedia.org/wiki/Content-addressable_memory#Ternary_CAMs) to
hold routing information so that switching decisions can be made
at high speed.  This is fairly expensive memory.  The reason this
made press is there is only enough TCAM to hold 256,000 IPv4 routes,
and the IPv4 routing table is now up in the 290,000 range.  You can
find lots of articles about how folks are choosing to throw away
prefixes to fit in the TCAM limit.

The result though is there is a lot of documentation available about
this platform.  Here's a presentation from NANOG:
http://www.nanog.org/meetings/nanog39/presentations/fib-desilva.pdf (Page
10)

You'll find that the original Sup720-3B defaulted to 192k IPv4
routes, and 32k IPv6 routes.  This could be user-configured, allowing
256k IPv4 routes with 0 IPv6 routes, or 64K IPv6 routes with 0 IPv4
routes.  This all falls out of the IPv6 routes are 4x as large.

In the upgraded hardware the vendors came up with some tricks
("double lookup").  The Sup720-3BXL can hold 512k IPv4 routes and
256k IPv6 routes.  Using these software tricks they are able to get
the multiple down to 2x.

The really scary thing here is that, in the short to medium term
giving out the same number of IPv6 routes as we have IPv4 routes
in the past is likely to exhaust the available memory on many
devices.  This is both because IPv6 routes are larger, and in the
short and medium term providers will have to hold both an IPv4 and
IPv6 routing table.

Some folks have taken a "so what, routing isn't ARIN's problem"
position.  That's true, and to a degree I respect the position.
However, it turns out the problem here is worse than many realize.

If you look back to the dark days of the Internet when the routing
table grew to where AGS+'s couldn't hold the table anymore and we
implemented CIDR you'll find lots of companies implemented prefix
length filtering.  The reason you can't route smaller than a /24
today is largely rooted in history.  Indeed, for many years particular
backbone providers wouldn't route smaller than a /19, so if you had
a /20 well, you just didn't get access to a portion of the Internet.
It was all quite messy.

The 6500 situation is causing a sort of mini-repeat.  The box is
so popular in edge applications that folks are having to figure out
how to deal with a box that can't hold all the routes.  It's easy
to find suggestions on google, and once again they typically involve
filtering at a /19 or /20 border, default routing to an upstream
router or provider and dealing with the lack of routing.

So folks know how to deal with this, why would IPv6 be worse?  Well,
notice the solutions in the past have been prefix length filtering.
If ARIN provides all /32's then there is no way to filter based on
prefix length.  A filter of /32 is nothing, of /33 is everything.  We
will have inadvertently removed the primary tool providers use to deal
with this issue.

How will providers filter?  I don't know.  Really the only information
left in BGP at that point is AS-PATH, so that's the only filtering
providers could implement.  There's a real danger that providers
look at "Humm, this is the ASN for Facebook, and this is the ASN
for Lolcats.  More people want facebook, filter the Lolcats."
Providers could implement fees to get a routing table slot, fully 
settled across peering connections.  The fees would rise to keep folks
from injecting routes that could not be paid for.

Now, there is some good news here, from the Weekly Routing Table report
posted to Nanog:

] ARIN Region Analysis Summary
] ----------------------------
] 
] Prefixes being announced by ARIN Region ASes:                    124963
]     Total ARIN prefixes after maximum aggregation:                65735
]     ARIN Deaggregation factor:                                     1.90
] Prefixes being announced from the ARIN address blocks:           125685
]     Unique aggregates announced from the ARIN address blocks:     51667
] ARIN Region origin ASes present in the Internet Routing Table:    13003
]     ARIN Prefixes per ASN:                                         9.67

If we can give everyone with an ASN an appropriate sized block so
they only need 1 block we likely can reduce the size of the roting
table by up to 9.67 times.  Folks are announcing multiple blocks
in part because that's how the system works, you get a new block
every year.  10 prefixes per ASN, and ARIN has been around 11
years.....interesting.

My personal feeling is both extremes are bad.  Current IPv6 policy
is too conservative, still reflecting both an IPv4 mind set and also
some failed notions of addressing heirarchy.  However the idea that
we can give a block to everyone who walks up, circa the allocation
policy of 1989 is equally foolish, and will lead to widespread
industry issues.  We need to chart a reasonable middle ground,
perhaps one more liberal than we have now but not wide open.

Personally I think the focus should be on getting everyone with an
ASN an appropriately sized block.  If we do that, we can make the
IPv6 routing table 7-10 times smaller than the IPv4 routing table.
That makes the transition possible, as remember both tables have
to fit in memory for the time we are dual stacked.  Given that IPv6
addresses are 4x as big, achieving an 8x reduction is really only
saving half the memory.  Once we've gotten this aspect right we can
focus on opening up space to more players.

Lastly, there seems to be a notion floating around that lack of
IPv6 address space is the problem.  The majority of the large players
have IPv6 address space.  I think if you looked at the top 10 ISP's
in the US, all of them now have IPv6 address space from ARIN; and
yet only 2 or 3 are offering IPv6 services to their customers.  The
idea that address space is preventing deployment fails right out
of the gate.  Many of these folks have space and are still not
offering services.

It is easy to think the solution is a new entrant.  Clearly if the
incumbents don't want to provide a service then a new entrant will.
This logic doesn't hold though, as the primary reason the existing
operators aren't offering Ipv6 is lack of a business cases.  If
there is no business case there is no way for a new entrant to make
a go.

So the critical items going forward to me are:

  - Remove the totally arbitrary 200 site requirement.  There are plenty
    of thriving ISP's with less than 200 customers.  Colo based
    companies come to mind, your 100,000 sq ft data center may have 10,
    10,000 sq ft customers, and 500,000 machines in the building.

  - Make sure everyone with an ASN can easily get a right sized IPv6
    allocation.  Let's make good on the idea of "one route per ASN".
    Let's insure everyone in the game can get space easily on that
    basis.

  - Come up with sensible requirements for new entrants.  "200
    customers" is not a sensible requirement.  Starting someone off
    with a /32 and then not talking to them is not a sensible way
    of doing business.  RFC 2050, and IPv4 both had the notion of 
    "slow start", which fit with the idea of needs based allocations.
    New entrants came back at 3, 6, and 12 months, and then yearly.
    If they were allocating "wrong" it was caught quickly, and fixed.
    We need some mechanism to talk to new entrants more often than
    every 10 years, make sure they are following the community
    rules, and clean up messes before they have collected for 10 years.

I don't think we have proposal(s) that get this right on the table yet.

-- 
       Leo Bicknell - bicknell at ufp.org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 825 bytes
Desc: not available
URL: <https://lists.arin.net/pipermail/arin-ppml/attachments/20090531/9159d771/attachment.sig>