[arin-tech-discuss] Preparation guide for RPKI 'surprise' outages (Was: Notice of upcoming maintenance to ARIN’s RPKI infrastructure)

Job Snijders job at fastly.com
Thu Jun 3 12:03:23 EDT 2021


Dear all,

ARIN announced an upcoming 'surprise' maintenance in July 2021. Full
details have not yet been disclosed - to make it a real surprise! :-)
I think this RPKI experiment is useful, as it can help ARIN better
understand its role and responsibilities in the ecosystem, which will
help making more informed decisions.

I'd like to share some notes how to assess your operational model (aka
'risk') and how to prepare. Preparing for this event will also help
with unannounced surprise maintenances. The below checklist probably is
good to confirm every few months in most operations.

A) [ ] Have RPKI ROAs been created for my IP prefixes?

   Check whether anyone in your organization created RPKI ROAs in ARIN's
   online portal. You can check this either by logging into the portal,
   or checking an external tool such as http://irrexplorer.nlnog.net/
   (check for resources where the RIR column shows ARIN, and the RPKI
   column is non-empty).

   -- > If your prefixes are not covered by RPKI ROAs, you can stop
     reading, the upcoming maintenance will not affect your routes. <--

B) [ ] Are my validators up to date?

   Ask the engineering team whether the latest recommended version of
   the choosen validator has been qualified, tested, and burned-in. 

   As we don't know /what/ exactly ARIN will change, if you'll want to
   use a validator that is known to have a strong cryptographic posture
   derived from well-regarded industry-standard crypto libraries.

   The RIPE NCC Validator is not supported beyond July 1st, 2021, so any
   software defects uncovered by the ARIN experiment will not be fixed
   by RIPE NCC.
 
   I personally recommend using the latest version of NIC.MX's FORT, or
   OpenBSD's rpki-client.

C) [ ] Is my organization monitoring my RPKI ROAs?

   In order for us to be in a position to even complain about a RPKI
   service outage, the RPKI needs to be monitored of course! :-)

   NTT's BGPalerter can be used to monitor both BGP routes _and_ RPKI
   ROAs. This free tool can alert you when RPKI ROAs unexpected
   disappear, or appear, and also alert about BGP route visibility.
   It'll depend on the exact type of failure mode the surprise
   maintenance will trigger, what alerts one can get out of the tool.
   
   https://github.com/nttgin/BGPalerter

D) [ ] Have I correctly configured my BGP Routing Policies?

   It is of paramount importance that operators only use Validated RPKI
   ROA data to reject RPKI invalid BGP routes. A common mistake is to
   configure your EBGP routers to associate a BGP Community (or other
   BGP Path Attribute) with a route dependent on the RPKI validation
   state.

   The problem with associating BGP Communities with the Validation
   State, is that any change in the Validation State will trigger BGP
   MESSAGES to be send in all kinds of directions with a new
   ('not-found') BGP Community associated. Use of RFC 8097 Communities
   is also not recommended for the same reasons.

   Many operators attach BGP Communities based on RPKI State to 'see how
   many routes would be affected', but this increased observability is
   also the cause itself to routing instability.

   The correct and robust way to configure RPKI ROV in routing policy is
   outlined on this guide (for multiple vendors):

   https://bgpfilterguide.nlnog.net/guides/reject_invalids/

   --> Policies that mark 'valid' or 'not-found' BGP routes with a
       BGP Community, will see/trigger BGP routing churn for
       ~ 35,000 BGP routes in the Default-Free Zone. <--

   If you are unsure what your routing policy does, feel free to send me
   a copy and I'll read it to confirm with you. Your RPKI-related
   routing policies really should be a simple and short as the guide
   outlines.

E) [ ] Are any of my customer onboarding processes dependent on RPKI?

   Some cloud providers have embraced the practise of requiring their
   Bring-Your-Own-IP (BYOIP) customers to issue RPKI Route Origin
   Authorizations with the cloud provider's ASN.

   Sometimes this is a one-off check. If such a one-off check happens
   during ARIN's surprise maintenance, the check might fail, and thus
   needs to be repeated at a later moment to progress the onboarding
   process.

----

I look forward to more data and information as it becomes available. I
appreciate ARIN initiating the coordination of some lightweight RPKI
fire drills. :-)

I'm available for questions on-list and off-list.

Kind regards,

Job



More information about the arin-tech-discuss mailing list