[ppml] Longer prefixes burden the FIBs of DFZ routers

Tue Aug 21 00:24:23 EDT 2007

Hi Leo,

Thanks again for your helpful response.  You wrote, in part:

>> I didn't mention these, because I assume - perhaps wrongly - that
>> these don't carry the burden of Internet traffic.  It is my
>> impression that when an FIB in a router is handling a packet
>> which matches a /32 prefix which is for that router's IP address,
>> or some other router's IP address, that the packet is most likely
>> to be a BGP message between the routers or some configuration,
>> logging traffic etc.
>
> I am rather sure this assumption is wrong, at least for IPv4.  I'm
> not sure we have enough IPv6 experience to know if it is wrong for
> IPv6 as well.
>
> In IPv4 it would not be uncommon for an ISP to use a /32 "virtual
> address" to hit a pile of load balancers for a Usenet farm, or a
> VoIP switch farm, webmail front ends, or a streaming video farm.
> Many people put their caching resolvers all on virtual IP's and
> anycast them internally with BGP to provide more resiliency.

I have only a partial understanding of this.

My concern was primarily with what burden address policy places on
all DFZ routers.  If some or may ISPs want or need to do other more
demanding things with their DFZ routers, but those things are not
forced upon them by address policy, then that is their choice and
not such a concern for the whole Internet.

On the other hand, if many, most, or all DFZ routers already have to
cope with various common, ISP-specific burdens such that the burdens
specifically caused by address policy are not much more of a
problem, then those common, ISP-specific burdens are of interest
because they set some kind of boundary within which address policy
burdens can be relatively freely added.

> But even if your assumption was right, it's wrong.  That is to say
> if the only "expensive" operation was packets to my router
> loopbacks for iBGP it might look ok to have those lookups take
> more time in steady state.  However all it takes is an attacker
> finding those addresses and DDoSing them with packets to make that
> not work.  If the lookups are more expensive, the result is a much
> more attractive attack target.

DoS attacks are a concern, but to me it is a second-order problem
compared to the task of handling the main volume of traffic.

>> If I was designing a router - which I am not - I would want to
>> know what length prefix to optimise the performance for.  It
>> would be no good saying "Sometimes the router needs to handle
>> /128 so the router must be optimised to forward packets to /128
>> prefixes" when in reality, most of the traffic would be to /32 to
>> /48.

> ...

>> I think it would be a very good idea for the IETF and the RIRs to
>> decide, very carefully on something exactly like this.  I think
>> the IETF and the RIRs should be able to decide that under no
>> circumstances would IPv6 address policy in the next decade or two
>> require DFZ routers to look at any more than 48 bits of address
>> for Internet traffic.
>
> A very interesting idea.  I think a "decade or two" may be too
> long, but given we are in the early phases it may be worth
> considering your idea.  I'm afraid it would have to be a global
> policy to mean something to the vendors, but perhaps a "under no
> circumstances will RIR's require prefixes longer than /XX before
> 2015" might be a way to reduce deployment costs.

Thanks for your tentative support of my idea.

> Of course, the trap is that if that assumption is built into
> hardware, and 2015 comes in a lean time there will be great
> pressure to keep it for economic reasons...

It is difficult to envisage IT adoption and usage beyond ten years
into the future, but we do have two decades of Internet experience
behind us now, and IPv6 has been around for a decade.  I don't think
it should be too hard to fix some kind of length limit for
advertised prefixes, especially considering there is broad agreement
that the total number of prefixes can't be allowed to grow
continually beyond a few hundred thousand.

The consensus on the RAM list seems to be that we will definitely
need some new architectural arrangement for multihoming potentially
millions of end-user networks, rather than just advertising more
prefixes.

> From my own talking to vendor reps they are trying to make some
> hay with the fact that IPv6 routes are quite sparse relative to
> the address space.  In particular, even if you cut off at /112,
> there are likely to be no routes in the /64-/112 space as
> currently deployed.  Also, I believe the largest current
> allocation is a  /21, so it's probably unlikely to have a route <
> /18.  If we look just at those ranges, we're down to a space of 54
> bits that are likely to be present in routes (18-64, 112-128).
> 54 bits is a lot more tractable than 128, and fits well within the
> 72 bit TCAM space.
>
> With a good method of hashing (in hardware) this could be done,
> and leave a pretty good chance that prefixes of the "unlikely
> lengths" would not cause significant problems until there were a
> lot of them.

I don't have a clear idea of how hashing would work, since the
system has to make use of all the address bits up to the length of
the prefix - not just segments of them, unless perhaps the 3 most
significant bits can be ignored, because of some prior step which
ensured they all matched 2000::/3.

Also, if some addresses were slower to handle due to hash algorithm
misses, then if this was known to attackers this could lead to the
DoS critique.

> I would love to have reps from Cisco and Juniper (Foundry,
> Extreme, Force 10?) come to an ARIN meeting and give some
> information on how they handle forwarding IPv6 packets, and if
> ideas like your prefix length limit help them in a significant way
> or not.

I strongly agree.  All Internet users pay for the routers and their
operational costs.  The router manufacturers have to create boxes
which do whatever gymnastics result from address policies, IETF RFCs
etc. and the practical aspects of running DFZ routers.

I get the impression that some people seem to think this hardware
stuff is relatively easy.  I received a private message recently:

>   FIB scaling and speed is a solved problem (the best solutions
>   are proprietary).  It is not a major cost driver in fast routers
>   relative to other features.

but I don't believe this.  It simply does cost more in terms of
memory space, read cycle times, complexity of the CPU work in
running the trie algorithm etc. (or vast width and depth of TCAM) to
handle four million /128 prefixes compared to four million /24
prefixes or 250k of these lengths.

My correspondent - who works for a high-end router company - responded:

>   FIB memory scales mostly with the number of prefixes, not their
>   size (it is super-linear, but not badly so).
>
>   The number of memory cycles needed for FIB lookup is negligible
>   relative to the number of cycles needed for other forwarding
>   functions (e.g., statistics, ACLs, queue scheduling).

While I understand these things are important for many purposes, I
am not sure that Access Control Lists, detailed statistics or
deciding which queue of the output interface to use are for DFZ
routers.  Maybe they are - I think my correspondent knows much more
about this than I do.

> The difference in cycles between v6 and v4 is more than made up by
> the fact that the worst-case packet rate for IPv6 is 2/3 that for
> IPv4.

Still, I think we need to ensure that address policy places burdens
on routers only to the degree that the benefits outweigh the costs.

> If all IPv6 end-site prefix allocations were a fixed size (say
> /48), that would make things slightly easier.  But not enough to
> get excited about.

I think that even a moderate firming of the ground on which routers
are designed would be worthwhile, provided it did not overly
restrict Internet usage in the future.

> Most vendors are trying to figure out how to implement line-rate
> DPI. FIB lookups are the least of our problems.  If you don't
> believe me, talk to some other vendors.

As far as I know, Deep Packet Inspection is not something which
should be occurring in  DFZ routers, at least in terms of handling
ordinary traffic packets.

I think this debate is straddling several related questions:

1 - To what extent can address policy be framed to enable the
    long-term design and manufacture of efficient, less expensive,
    less power hungry routers - without overly restricting Internet
    usage?

2 - To what extent is the length of advertised prefix a factor in
    the costs, efficiency etc. of FIB functions?

3 - To what extent are DFZ routers, in general, already doing things
    which are more demanding than required by plain Internet traffic
    - and therefore, to what extent can the plain traffic burdens
    be acceptably extended to match what the routers must already be
    capable of?

4 - Alternatively to 3, would it be feasible to distinguish between
    routers which handle plain traffic and those which do other
    fancy things, including MPLS, VPNs etc. as many ISPs want and
    need to do?

I get the feeling the answer to the last question is currently No.
In that case, routers will continually get more and more complex and
overloaded with functionality which at least some portion of the
market requires, with all DFZ routers necessarily being of this
over-complex type, despite the actual requirements of plain DFZ
traffic, according to actual Internet usage and address policy,
being more modest.

This is analogous to the car industry only being capable of
producing one model, and since some people want some things and
others want others, everyone has to buy a Humvee or a Cadillac,
because that is the only model they make.

I think address policy should be very finely tuned to the realities
of router design.  On the RAM list we are contemplating a major,
kludgy (in my view) overlay of tunnel routers, global database, etc.
simply due to the realities of router design not being able to cope
with the demands of the BGP control plane with significantly more
than the current 220k IPv4 BGP advertised prefixes.

The trick would be to get the router people to say, in public, "Our
company would find it difficult and expensive to do XYZ . . . " when
they are sitting opposite their competitors, who might be prone to
say "Sure, we can do that . . .".

Maybe such discussions would need to be under the Cone of Silence -
but would that run foul of anti-trust laws?

Stephen Sprunk wrote, in part:

> The IETF has told vendors to not optimize for any particular
> route length, and so far it appears they're heeding that advice.

If so, I think this would be an abandonment of the IETF's duty to
facilitate the efficient operation of the Internet.  (I don't recall
the RFC which spells this out.)

This would be like telling road engineers not to optimise highways
for any particular type of traffic: sports cars, the largest trucks,
buses, horses with carts, motorbikes, bicycles and trucks carrying
oversize objects like parts of houses etc.

DFZ traffic has a statistical profile, which changes over time.  I
think it is vital to set some limits on certain aspects of that to
facilitate the design of routers which are optimised for the actual
traffic they will handle.

It would be nuts to design a router with enough FIB RAM and
sufficient memory accesses so it could handle full line rate VoIP
packets to /128 prefixes when in reality, this will never be required.

The /48 PI prefix lengths already look really long to me.  At the
very least, I would expect the IETF and the RIRs to be able to say
they won't expect DFZ routers to handle traffic packets addressed to
 anything longer than this for a long time - say to 2020.

Many routers will no doubt be able to handle some packets to longer
prefixes - but only due to the choice of the ISP to do so.  If we
insist that this be at no extra cost compared to /48, then the price
of the whole router will go up significantly, with all Internet
users paying for this, when in fact this faster handling of packets
addressed to prefixes longer than /48 is not, or should not, be an
actual requirement of handling DFZ traffic.

  - Robin           http://www.firstpr.com.au/ip/ivip/