[ppml] Re: [address-policy-wg] Is the time for conservation over?

Owen DeLong owen at delong.com
Tue Oct 28 01:17:32 EST 2003

> I see your point. Let me ask you this: I am a multinational organization
> (non-ISP). I have offices all over the world including in New York and
> Hong Kong. In New York, I buy transit from ISP A and ISP B. In Hong
> Kong, I buy transit from ISP C. (the reasons I don't buy transit from
> ISP A in HK can be diverse: maybe ISP A's network in Asia stinks, maybe
> they give me a good deal in the US but not in HK, etc). Should I need
> multiple ASes for this? And even if you say so, convince me why I should
> listen to you when a single AS is so simpler for me.
Yes.  Because the RFC definition of an AS is a collection of routes with
a consistent policy.  The BGP definition of IBGP is that ALL border routers
within an autonomous system are connected via an IGP.  So, since you are
advertising two different route sets from two different disconnected places,
you have two autonomous systems by the BGP and RFC definitions of what an
autonomous system is.  It is not only not easier to do it with a single
AS, it actually doesn't work correctly and causes problems.  It does, in
some cases, work mostly, so you don't notice the problems except in rare
circumstances, but, you are definitely breaking things at least from an
internet standards, if not a pragmatic implementation perspective.

>> This also assumes that the majority of large organizations don't
>> want to transport traffic internally (I don't know how true/false
>> this assumption is for various organizations).
> Not when they can avoid it, this is a matter both of money and latency.

That's simply not necessarily true.  For example, if you are a multi-
national ISP with an OC-192 backbone, you probably don't want your packets
from Hong Kong to New York traversing random other providers' backbones.
If you are a multi-national corporation you may have reasons for preferring
to keep as much traffic on your own backbone as possible.  There are
different flavors of need and desire.  I understand that your situation
leads to you wanting to let others carry your traffic for you.  In other
situations, that cost is not necessarily the primary concern (or even
effected by this decision in some cases), and, latency may drive some
organizations to run their own backbone depending on their presence in
the places they need to be and the existing needed topology.

> Back to the example above: I am a multinational organization (non-ISP).
> I have offices all over the world including in New York and Hong Kong.
> Imagine the case of a customer of ISP A that is in HK and that wants to
> access my HK site. Regardless of the fact that I have different ASes or
> not, if I don't announce the HK block longer than my global block, this
> is doubly bad because a) the traffic from ISP A's customer in HK goes
> all the way to NY on ISP A's backbone and comes back on my internal
> network; very bad for latency; b) I also have to carry the traffic back
> on my own network, which costs money. Conclusion: the only traffic one
> wants to transport on the internal network is internal traffic.
If you have different ASs, they should have separate blocks.  There is
no advantage to a single block if it is not a single autonomous system.
It makes much more sense for you to have two smaller blocks than a single
larger block.  If ISP A wants to make them contiguous so that you could
aggregate them if you formed a single AS, that's fine.  If not, then,
that's fine too.  There's NO gain to it being an aggregateable prefix
if it doesn't meet the single AS test above.

> Another topic:
> In New York, I buy transit from ISP A and ISP B. In Hong Kong, I buy
> transit from ISP C. I want to announce the specifics of the HK block in
> both places (for redundancy; if ISP C in HK tanks but my internal
> network and links are still up I can do the scenic routing mentioned
> above; not good but okay as a backup). If announce the HK block with two
> different ASes, I have an inconsistent AS issues (same block sourced by
> different ASes).
Unless you have a backbone to carry that traffic, then announcing it for
redundancy doesn't work.  If you are a single contiguous AS announcing all
the same prefix(es) in all your EBGP peering points, then, you meet the
single AS test above.  If you want to use external connections to deliver
that traffic when they will work instead of your own backbone, then you
should create external tunnels between your border routers and run an IGP
across them, then run your IBGP on top of that.  You should not be
announcing different sets of routes from different locations.  You should
announce proper MEDs on the routes you do announce and announce the same
set of longer prefixes at each location.

> How do you address this (in IPv4 for a start)?
See above.

> As far as I know the only available way today is to announce longer
> (specific) prefixes with the same AS, prepending it differently for
> different blocks at different locations. What am I missing?
Prepending is an alternative ugly hack to MEDs which are the right way
to do this.  I realize most people use prepending because:

	1.	Most people don't understand MEDs
	2.	Most providers don't listen to MEDs because their customers
		don't tend to get them right.

>> As to GLB, you are not making sense to me.
> Maybe by GLB we don't speak of the same thing. I'm not talking about
> akamai-like things that trick DNS in one way or the other and do some
> custom RTT measuring over UDP from their different hosting farms.
Neither was I.

> Keep in mind where this started: it would be easier to aggregate IPv6
> than it is to aggregate v4. WRT what I wrote above, explain me why.
I didn't say that.  i said that there is no need for a single autonomous
system to have more addresses than a /32 would allow and I cannot imagine
a case where a single AS needs more than a /48 unless they are an ISP
with /32 customers.

>> [Teredo used as NAT traversal]
>> I'm not convinced that is entirely true... I'm sure they could
>> have tunneled it all across port 80 (Micr0$0ft is getting very
>> good at hiding the entire IP stack inside HTTP(s), and it's an
>> increasingly disturbing trend).
> I hear your point, but in this case it is.
I believe that.  Like I said, I'm sure Micr0$0ft will adapt and break the
internet protocol stack in any way they feel they need to in order to get
their application sold.  They have demonstrated a complete and total
disregard for standards in the past any time they became inconvenient.
For some reason, I think M$ is hoping that this will drive the 
of V6.  I don't know why M$ wants to drive V6, but, if they want it, it
may not be good for the internet.  (When was the last time M$ wanted
something that turned out to actually be good for the internet?)

>> [multi-address]
>> But, that appeared to be what people were saying was the
>> solution for V6 multihoming.
> The same brilliant minds that never configured a router in their entire
> life, that say that IPv6 renumbering is easy (see below) and that
> designed IPv6 never thinking of it as a product that needed to be
> deployed in the real world, ignoring basic market realities and the fact
> that some pieces of it like multihoming are still missing.
OK... So this gets me back to thinking V6 isn't ready for public consumption
and why waste time worrying about it until they get it ready.

>> I'm just trying to figure out what it takes for V6 to have a
>> reasonable (the router can parse it and route packets) routing
>> table _AND_ allow reasonable multihoming (at least as good as
>> what is achievable today).
> Nothing is going to be as good as what we have today. MHAP does solve
> the reasonable routing table, but at the cost of some added complexity.
> No free lunch. One of the things that could have made MHAP worth the
> trouble is that it provided not only a solution at the routing table
> size but also other perks such as survivability of open sessions in any
> failure mode, something that BGP is far from delivering.
That's absurd.  What we have today is largely broken and we should be
striving to fix that with V6.  If we can't do at least as well as we
do today, it will be an operational non-starter.  As to the rest, I
agree, if the convergence time is fast enough, which, from what I saw of
MHAP was UNLIKELY at best.

>> I took the multi-address position from someones paper on "how to
>> do it" and didn't develop it myself. I have no religion either
>> way.  If there is a single-address way to do it, I'm all for that.
>> What is it?
> Currently there is none; MHAP would have been the closest I could think
> of. The problem of the multi-address solution is not that it's bad; it's
> not. It's OK for certain types of setups such as home/soho, but not for
> large setups. Early on the ipv6mh days we quickly came to the conclusion
> that "THE" IPv6 multihoming solution did not exist; an assemblage of
> different collaborative but distinct solutions targeted at different
> multihoming needs was the best we hoped for.
I don't see why it's all that bad for large setups.  You simply end up
with 2 or more network addresses for each network.  This technology is
well understood in the v4 world.  Heck, I'm even looking at the possibility
of doing something like this (ugly hack, but all 1918 space so who cares)
on a collection of servers that absolutely positively have to be reachable
as long as they are up... Two physical interfaces each with a similar
prefix (say 10.1.1.n/24 and 10.1.101.n/24) and a related loopback address
of 10.1.202.n/32.  They'll all run Zebra to advertise the /32 to the
routers reachable by both Phy. interfaces, and, they'll have ip_forwarding
off in the host.  The routers will only advertise the /32s to the other
routers on those two networks.  (two sets of two routers, one on each
network and all connected to each other with  a point-to-point mesh).

DNS will resolve to the loopback address 10.1.202.n  (there will be 
names for reaching a particular PHY).  This is anycast style host connection
redundancy.  It could be made somewhat more pathological with redundant 
using anycast layered on top of this (multiple physical machines all 
the same 10.1.202.n address), but, that's not good for long session 
just single-packet transactions.

>> Renumbering a large network is painful in V4.  V6 was supposed
>> to have pretty much fixed that.
> Absolutely not. Given the current network management practices and
> tools, renumbering an IPv6 network is _not_ significantly easier than an
> IPv4 network.
That's pretty unfortunate.

>> If it didn't, V6 needs more development work until that is fixed.
>> Renumbering is a common occurrence for a variety of reasons and
>> we should develop tools to make it not painful.
> I agree, but IPv6 by itself is not one of these tools. The myth of IPv6
> easy renumbering comes from two things: a) stateless autoconfig, which
> was nice before we had DHCP, and easy support for multiple addresses per
> interface, none of which are any kind of a breakthrough today. If you
> have time, read:
> http://www.ietf.org/internet-drafts/draft-baker-ipv6-renumber-procedure-
> 01.txt
I dont, and, I thought V6 also provided for host-address assignment on the
address-server (DHCP replacement) being itself dynamic (this is not common
in todays world), and that V6 renumbering would be more like renumbering
an Appletalk network (I'm not defending Appletalk as a protocol, but, they
did have address assignment and renumbering pretty well down).

>> However, in my opinion, multihoming in the traditional sense of the
>> word, a single AS attached to more than one upstream transit AS, is
>> the harder of the problems to solve, and, it is not clear to me how
>> this works in V6.
> It does not, because today one can't obtain portable address space that
> could be announced to multiple transit ASes and announcing a PA block
> you got from one upstream to another upstream does not register.
Then V6 either needs to come up with an alternate solution or face
obsolescence before adoption.

>> This doesn't bode well for IPv6 being adopted any time soon. It
>> sounds like there are still many real world operational problems
>> that are left as an exercise to the implementer.

OK... At least I know where things stand a little better now.  I had higher
hopes.  There were some smart people working on this (and, at one time, many
of them actually were people with OP-EX).


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
URL: <https://lists.arin.net/pipermail/arin-ppml/attachments/20031027/66d18e8e/attachment-0001.sig>

More information about the ARIN-PPML mailing list