[arin-ppml] Routing Research Group is about to decide its scalable routing recommendation

Fri Dec 18 10:51:53 EST 2009

Thanks Robin for a very complete reply.  No matter how cantankerous
I may seem in e-mail I really appreciate it.  There's a lot to
follow here and most folks just don't have the time.

In a message written on Fri, Dec 18, 2009 at 08:20:54PM +1100, Robin Whittle wrote:
> > In looking at the problems with the current Internet it appears
> > most of the blame is laid at the feet of BGP4.  BGP has a number
> > of properties that lead to scaling issues, some well documented,
> > some not so well documented.
> 
> Can you mention those which you think are not well documented?

I think the effects of how ISP's configure BGP on the performance
have been relatively poorly studied.  What I see mostly are lab
tests, there are very few studies tracking CPU usage or propogation
times in the real Internet and relating that back to protocol or
configuration weakless.

> Core-edge separation schemes (LISP, APT, Ivip and TRRP) don't alter
> the functions of hosts or most internal routers.  They just provide a
> new set of end-user address prefixes which are portable to any ISP
> with an ETR, and which are not advertised directly in the DFZ.  A
> covering prefix, including many such longer prefixes, is advertised
> by special ITRs ("Open ITRs in the DFZ" is the Ivip term, "Proxy
> Tunnel Routers" is the LISP term) so these routers collect packets
> sent by hosts in networks without ITRs and tunnel those packets to
> the correct address. APT and TRRP have functionally similar
> arrangements for packets sent from networks without ITRs.

I've done some work with the LISP folks, but I'm far from an expert
on that particular example.  At least in the LISP case, but I suspect
in all of these, it feels a lot like squeezing a balloon.  That is,
they do improve the area they are looking to improve, but at the
expense of reducing the performance in some other way.

As a result, as an operator, I find it hard to consider these
solutions "better".  Indeed, without the item being optimized for
being a very scarce resource it seems unlikely folks are going to
want to transition to a new technology solely for the same size
balloon.

> The first difficulty is that a new routing protocol can never be
> introduced to replace BGP4 unless it is fully backwards compatible -
> and no-one has devised such a thing AFAIK.

I strongly disagree with this statement.  While it would be vastly
easier if it were fully backwards compatable that is by no means a
requirement.  Indeed, many of the schemes proposed are what I would
call less than backwards compatable.  I know folks may look at
map-encap as keeping the host stack the same; but the reality is
to the backbone operator it is a forklift upgrade to new technology
and stands a really good chance of requiring new hardware (in some
cases) at the edge.  Rolling out a new routing protocol, even if
it must just replace BGP, is no harder.

> The second is that it is pretty tricky to come up with a protocol for
> the Internet's interdomain routing system which could cope with the
> growth in the number of separately advertised end-user networks.
> 
> There could be millions or billions of separate prefixes which
> end-user networks, including mobile devices, need to keep no matter
> where they physically connect to the Net.

Here is where I feel like there is a major disconnect.  Operators
today are concerned with growing the current system.  Perhaps looking
at a 10 year figure of 1 million IPv4 routes and 300k IPv6 routes,
using more or less the exsiting schemes.

What the IETF (researchers, vendors, etc?) seem to be looking at
is how can we give every man woman and child a provider independant
prefix and route all of them.  Your "billions of prefixes" case.

That's worthy work, I'm all for seeing if there is a way to do that
and then assessing if it is a path we want to go down as an industry.
However I feel that it is coming at the expense of the much less
sexy problem of "how to we keep the status quo going for 10, 20,
or 30 more years".  It's also a harder problem, which means it will
almost certianly take more money and effort to solve.

> Many of the proposals now being made for the RRG process seem to
> involve host changes - specifically making the host responsible for
> more routing and addressing things than in the past.  This is for
> *every* host, not just for mobile hosts.

There is an old belief that the Internet succeeded over some other
technologies in part due to the fact that "routers are dumb" and
all of the smarts are in the host.  TCP congestion control is often
cited as an example, let the hosts deal rather than having X.25 or
Frame Relay style notification in the middle boxes.

While I think many of the examples given are poor, I think the
premise is right.  Having to scale the technology in a PC is vastly
cheaper than in core routers.  If there is a choice of putting
complexity in the host or in the core router, the host wins hands
down.

The down side of course is that there are many more devices to upgrade, 
so adoption is a much harder thing.

> Even if these objections could be overcome, I would still object to
> it because it significantly slows down the ability to send the first
> packet of user data.  This slowness depends on the RTT between the
> hosts, and is greatly exacerbated by a lost packet in the initial
> management exchange which must precede (AFAIK) the packet which
> actually contains the user traffic packet.

Having lived through some technologies with this property in the
past (e.g. some of the IP over ATM "solutions") and seeing how it
works in proposals like LISP I have to say this is an achillies
heal in many of the proposals.  Not just due to first packet slowness,
but due to the caching of this information that must occur.

Cache based router designs went from being the most common to
non-existant a decade ago for a wide range of performance reasons.
It seems to me most of the proposals bring back the badness in these
designs, but rather than having it be all in one box, it's distributed
across multiple routers and hosts across the Internet.

I can't wait for the day that routing instability causes worldwide
cache invalidations in some map-encap scheme. :(

Lastly, a bit of ARIN related content...

What I am hearing is that the research community believes that BGP4
cannot scale to the point where everyone has a provider independant
prefix.  That if we want to have everyone have a provider independant
prefix, we need a new technology to be in place first.

I think that's a very important message for the ARIN community to
understand.

-- 
       Leo Bicknell - bicknell at ufp.org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 826 bytes
Desc: not available
URL: <https://lists.arin.net/pipermail/arin-ppml/attachments/20091218/4720a7d0/attachment.sig>