[ppml] IPv4 swamp, mh end-users & IP architectural solutions: LISP/APT/Ivip

briand at ca.afilias.info briand at ca.afilias.info
Mon Aug 20 01:56:38 EDT 2007


> In "Re: [ppml] Policy Proposal: IPv6 Assignment Guidelines" Brian
> Dickson wrote, in part:
>
>> The *only* problem with the IPv4 swamp, was that there were
>> multiple assignments to ASNs, that couldn't be aggregated by the
>> ASNs.

Note that the scope of my comment was swamp-specific. I was not addressing
the existence of, cause of, or solutions to, IPv4 routing.

And, in fact, the swamp isn't the only source of IPv4 table growth at
excessive rates.

Other causes have been:
- failure to aggregate, deaggregation, and poor aggretation
- inability to "force" third parties to employ BCP's
- post-CIDR allocations that couldn't be aggregated (the swamp was pre-CIDR)
- inability to limit scope of TE-based routing announcements

The last item, is addressed in a new proposal for explicitly limiting
how far a prefix should be propagated, as-path-limit.

> I disagree with the statement that the only problem with IPv4
> routing is that some or many end-user networks (or ASNs in general -
> providers and end-users combined) have more than one BGP advertised
> prefix which are numerically separated and so can't be aggregated
> into a single advertised prefix.  There are also problems with
> stability, excessive updates for some prefixes and the projected
> growth in the number of end-user networks which must be multihomed.

I'll ignore the middle two - they're orthogonal to routing table size
issues.

The last issue, however, is definitely germane.


> My understanding of the consensus on ROAP (Routing and Addressing
> Problem) is that the biggest single problem is the inability of the
> BGP control plane to cope with the growth in the number of
> multihomed end-user networks, which could reach into the millions -
> even if every network only had one advertised prefix.

And here's where thinking in IPv4 mode results in lots of good ideas
chasing a place to live, where there are much simpler ideas, that can
be implemented *currently*, with *current* technology, in very *simple*
methods. They don't require any new protocols, and can be done by the
network administrators of the "end-user" networks, with novice-level
skill sets, and without the rest of the world needing to know about them.

Here's where the break-down occurs, on the "millions and millions" issue:
There is *need* for multi-homing, and there is "need" for multi-homing.

The difference is, scope.

Any organization which provides services for unaffiliated entities, which
we can generally refer to as an ISP, might currently be single-homed,
but can generally be categorized as *needing* to be multi-homed.
It is reasonable to expect, sooner or later, that every ISP will be
multihomed.

However, the needs of most entities who are entirely self-contained,
as far as multi-homing are concerned, get the "quoted" treatment.
The fact that there is the ability to proscribe an architecture, with
some level of centralized control, means that such an entity can achieve
all the benefits of multihoming, without consuming DFZ routing slots.

I'll give a specific example, below.

I will note in advance, that there is lots of room for improvement over
this scheme.

However, I hope that my example demonstrates, that reasonable
levels of redundancy on upstream connections, reachability, services
offered to the world, and independence from vendor implementations and
lock-in, mean that this kind of solution should be able to achieve
acceptance in the enterprise space. And since I doubt I've covered
all the bases, there are probably lots of other ways to do this kind of
thing, some of which may even be better. :-)

The key thing to remember is - in IPv4, address space is scarce, and not
cheap. So, everyone is under the common misconception that you would only
ever use one IP address (at most) per device.

In IPv6, there's so much address space available, that it is entirely
reasonable to presume multiple IPv6 addresses for everything. And that's
the key to coming up with multi-homed schemes that scale, that work well,
and that don't depend on DFZ slots for the multi-homed.

Example of multi-homing without BGP, in an IPv6 world:

Pick three colours that work for you, to identify address spaces that
company X will use. Call them "red", "blue", and "purple".

Say that X has two upstreams, "R" and "B". R uses (IPv6) address space RED,
and B uses BLUE. R assigns "red" to X, and B assigns "blue" to X, both
out of their respective PI/DFZ blocks.

These two small blocks are PA space. They are presumed to be aggregated
by R and B, and only exist in the DFZ as the aggregates RED and BLUE.

Now, "purple" is a PI block given to X by their RIR.

We will show how X can use "purple" to multi-home, without announcing
"purple" into the DFZ. And, X will be able to do this and achieve all
the benefits of multi-homing.

What X does, is partition "purple" into 2 (or more) blocks.
One block, p1, is used for 1:1 NAT with "red", the other, p2, for 1:1 NAT
for "blue".

X can have any topology it likes, internally and upstream; we will presume
that for redundancy reasons, links to R and B are on different routers.
X1 connects to R, and X2 to B.

The main thing X will need, is a DNS infrastructure that supports "views",
where the results returned can be configured to vary based on the source
of the query.

The DNS servers will need addresses on all three blocks, "red", "blue",
and "purple". The global DNS served by X, will need to be advertised via
those name servers only. Everything behind/beneath that, will need to be
delegated by, or served by, those name servers, using the same "view"
model.

Any query via X1 needs to be given an answer with short TTL, and an address
on the "red" block. Any query via X2 does the same, only from "blue".

Thus, all other traffic follows the path established by the DNS servers,
and inbound traffic to non-DNS infrastructure gets 1:1 NAT'd to "purple"
addresses at either X1 or X2.

This is all long and confusing, I know - but here's the end result:
- if X1 or R goes away, the global DNS queries will "hunt" to "blue"
addresses for the DNS servers, and "blue" answers will get returned
- for X2/B, vice-versa occurs
- the result of this is that, as long as one of the upstreams is
available, bidirectional availability is assured
- TE types of ingress/egress optimizations can be performed on source
and/or destination by X1 and X2
- because the NAT is 1:1, it is suitable for every kind of service
- any underlying service that cannot tolerate IP NAT, can be un-NAT'd at
the other end, by reversing the 1:1 NAT function (since it is
deterministic)

Renumbering becomes very scalable, since the scope of renumbering the PA
space(s) used, is limited to the core portions of the DNS infrastructure,
and the routers X1 and X2 (and X3... if they exist). No renumbering of
anyting in "purple" space is ever needed.

And, all of this, while using PI space under the hood, for convenience
purposes, does not require that the PI space be put in the DFZ.

The un-NAT-ing that may be necessary, for things that require that the IP
address be used as an identifier, does require sharing information about
the PI block - but the scope of that is small compared to the size of the
DFZ.
And more importantly, that can effectively be a "shared secret". It's
a feature, not a bug. The un-NAT stuff can even be implemented on a host
basis, nearly trivially.

I agree (before anyone needs to say it) that this won't work for ISPs,
since the assignments/delegations that exist cross administrative boundaries.

But I'd offer the opinion, that multi-homed non-ISPs are likely to
outnumber multi-homed ISPs, by one or more orders of magnitude.
And conversely, techniques such as this, while necessary to conserve
the DFZ, can in fact do so.

The fact that this can be done today, is all the more reason to push
the ideas here, and start educating folks on the benefits of 1:1 NAT,
use of PA space, and use of clever DNS tricks to handle reliability
and availability requirements.

Brian Dickson




More information about the ARIN-PPML mailing list