<p><font size=1 face="Tahoma">Maybe I'm just that nerdy but I'm slightly
offended by the assertion that so few people own routers with RE/RP's and
a little TCAM. ;)</font>
<p>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">From:</font>
<td><font size=1 face="sans-serif">Leo Bicknell <bicknell@ufp.org></font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">To:</font>
<td><font size=1 face="sans-serif">Ted Mittelstaedt <tedm@ipinc.net></font>
<tr>
<td valign=top><font size=1 color=#5f5f5f face="sans-serif">Cc:</font>
<td><font size=1 face="sans-serif">arin-ppml@arin.net</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Date:</font>
<td><font size=1 face="sans-serif">12/14/2009 11:09 PM</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Subject:</font>
<td><font size=1 face="sans-serif">Re: [arin-ppml] A challenge to the assumption
that a big DFZ is a problem</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Sent by:</font>
<td><font size=1 face="sans-serif"><arin-ppml-bounces@arin.net></font></table>
<br>
<hr noshade>
<br>
<br>
<br><tt><font size=2>In a message written on Mon, Dec 14, 2009 at 11:07:10AM
-0800, Ted Mittelstaedt wrote:<br>
> Today I can walk into the store and purchase a PC that has a CPU<br>
> in it that runs at a clock speed of at least 10 times of<br>
> most routers, and has at least 10 times the amount of ram, for<br>
> a quarter of the cost of the annual service contract for most<br>
> DFZ routers (let alone the hardware cost)<br>
<br>
That you're asking this question tells me you don't know how larger<br>
routers (GSR, CRS-1, T640, T1600 etc) are architected at all. Please<br>
don't take that as an insult either, I suspect only a small fraction<br>
of the folks own the list own such routers, and only a much smaller<br>
fraction of those understand how they work internally.<br>
<br>
I'll provide the 10,000 foot view, but beware, that's all it is,<br>
there are a LOT of details at work.<br>
<br>
Let's look at a Juniper T1600. It is a 8 slot box, with each slot<br>
capable of 100Gbits/sec, bidirectional. Hint, 8 * 1000 * 2 = 1600.<br>
:) So if you're provisioning 10Gbps ethernet, a fairly fast technology<br>
today, you can put 160 10GE ports in the router.<br>
<br>
You don't route 1.6Terabits/sec on a CPU. Or on several CPU's.<br>
The "open source router" community (see </font></tt><a href=www.vyatta.com><tt><font size=2>www.vyatta.com</font></tt></a><tt><font size=2>,
as an<br>
example) suggests you can software route ~3-4Gbps on a very well<br>
tuned Nahalem CPU. To route 160 10GE ports would take 480 CPU's<br>
at that rate! Even at $500 per CPU, that's 240,000 worth of CPU<br>
alone. Not to count all the bus interconnections, DRAM, etc.<br>
<br>
No, these boxes don't work like that at all. Rather there is a routing<br>
engine (Juniper's term), or route processor (Cisco's term) which runs a<br>
CPU and does BGP with your neighbors. This is the "old, slow
CPU" that<br>
you're referring to in those high end boxes. Truth is though, even
the<br>
"old, slow CPU's" they use could handle several million routes.
All<br>
they do is run BGP, and create from that a single master copy of the<br>
routing table, generally called the FIB, or Forwarding Information Base.<br>
The distilled version of the routing table, similar to "show route".<br>
<br>
The CPU then pushes this table to the linecards, into special memory<br>
called TCAM. The tcam holds fields like:<br>
<br>
10.0.0.0/8 Linecard3Port2<br>
<br>
As packets come in, special hardware looks up the TCAM entry, and then<br>
sends the packet out over the switch fabric to the other cards.<br>
<br>
TCAM is expensive. Why? Well, consider a linecard in your T1600,<br>
dealing with a 100G (bidirectional) flow.<br>
<br>
That's:<br>
<br>
bits kilobits megabits gigabits speed bidrectional<br>
1000 * 1000 * 1000 * 1000 * 100 * 2<br>
<br>
Or 200000000000000 bits/sec. Or divide by 8, 25000000000000 bytes/sec.<br>
Now, let's say they are all 64 byte packets.<br>
<br>
64 / 25000000000000 = .00000000000256 SECONDS PER PACKET.<br>
<br>
Let me stack that with 1 nanosecond:<br>
<br>
.00000000100000<br>
.00000000000256<br>
<br>
It's a lookup every 2 picoseconds. This takes arrays of crazy fast<br>
TCAM.<br>
<br>
So long story short, the vendors guess. 1,000,000 routes on the<br>
internet distils into an 800,000 route FIB, and size the TCAM for<br>
that on each linecard. Note that generally TCAM is not socketed<br>
and not field upgradable. Given the speeds it is acutally difficult<br>
to socket, and it's very static sensitive for field upgrades. So<br>
it's soldered to the board.<br>
<br>
When the guess, by the vendor or the ISP, turns out to be wrong the<br>
upgrade cost is not the "old, slow CPU"; indeed that is often
working<br>
just fine if only taking 5 minutes to bring up a full table rather<br>
than the 2 minutes people would like. Rather it's throw out every<br>
linecard and buy new ones. The penalty for guessing wrong is severe,<br>
it's instant, total junking of all the linecards on your network.<br>
<br>
Care to guess what a 10 port 10GE linecard costs for one of these boxes?<br>
I'll assume you get some discount from your vendor, so maybe $400,000.<br>
So your 8 slot box costs 3.2 million to upgrade. Oh, but the new
cards<br>
will be more expensive, more TCAM. Got a network with 200 core routers<br>
(I can think of some ISP's with more, for sure) and you're "only"<br>
talking a 640 million dollar upgrade, for one ISP, just to handle a<br>
larger table.<br>
<br>
Before I go any further, I'm going to tell people up front I'm not going<br>
to engage in nit picking over any of the above. If you want to design<br>
core routers go work for Junper or Cisco, if you can do it for half the<br>
cost of current designs I'm sure they will pay you a nice sum. I'm
also<br>
sure it can be done both cheaper and more expensively, depending on<br>
circumstance. I've picked a run of the mill example, almost every
ISP<br>
is a special case in something.<br>
<br>
So anyway, from the big ISP perspective the situation is this: currently<br>
deployed hardware is what it is. Unless a multi-hundred million dollar<br>
check falls from the sky, it will be what it is until the next, already<br>
planned equipment refresh. When it will be what the vendor has already<br>
decided the next gen platform will be (you know it takes 3-5 years to<br>
develop a next gen platform, on a quick ramp, right?). Also keep
in<br>
mind some of the TCAM goes to things like MPLS VPN's, which are growing<br>
on their own.<br>
<br>
If these boxes end up exhausting TCAM there will be some upgrades, but<br>
the vast majority of ISP's will turn to filtering to solve the problem.<br>
Remove enough routes so it fits again; at least until the next refresh<br>
cycle.<br>
<br>
Lastly, I promise you this, the folks at the top 10 ISP's are all<br>
meeting with Cisco and Juniper several times a year, with real<br>
engineers, not sales folks, and trying to rationalize the cost of the<br>
parts with the needs of the network. They provide lots of engineering<br>
input on the next generation of parts. However, everyone involved
is<br>
having to commit now to how big those TCAM's will be in 3 years on the<br>
next gen cards, which will be in most of the nextwork in 5-7 years.<br>
<br>
Hence my statement on the matter. On some level it doesn't matter
if<br>
the RIR's give away blocks like crazy, or are as stingy as possible.<br>
What matters is that the rate at which blocks are given out roughly<br>
matches the rate that was expected. We can bend the curve, up or
down,<br>
but SLOWLY, as equipment is refreshed.<br>
<br>
There is a wall. It is a 200 foot thick concrete wall. No matter
how<br>
hard, or soft you hit it the wall will not move, you will be splattered.<br>
Fill most core router TCAM's and we're all in for a very bad few years.<br>
<br>
-- <br>
Leo Bicknell - bicknell@ufp.org - CCIE 3440<br>
PGP keys at </font></tt><a href=http://www.ufp.org/~bicknell/><tt><font size=2>http://www.ufp.org/~bicknell/</font></tt></a><tt><font size=2><br>
[attachment "attxedn9.dat" deleted by Keegan Holley/SAS/SunGard]
_______________________________________________<br>
PPML<br>
You are receiving this message because you are subscribed to<br>
the ARIN Public Policy Mailing List (ARIN-PPML@arin.net).<br>
Unsubscribe or manage your mailing list subscription at:<br>
</font></tt><a href="http://lists.arin.net/mailman/listinfo/arin-ppml"><tt><font size=2>http://lists.arin.net/mailman/listinfo/arin-ppml</font></tt></a><tt><font size=2><br>
Please contact info@arin.net if you experience any issues.</font></tt>
<br>
<br>