[ppml] A comprehensive discussion of whois and public data.

Leo Bicknell bicknell at ufp.org
Mon Apr 12 13:13:33 EDT 2004


$Author: bicknell $ - $Date: 2004/04/12 17:12:20 $ - $Revision: 1.6 $

I've been doing a lot of research into WHOIS and the various proposals
that have been made.  A history will be published in the next issue
of the ARIN newsletter.  Drawing on that history and all of the
current debates I see a light at the end of the tunnel.  It is my
personal opinion that much of the challenge individuals and their
proposals face is the lack of an overall plan.  A proposed change
may be good for one group, but without updating the other apects
of policy it may have disasterous implications for other groups.

Fortunately I don't see most of the proposals and desires as all
that incompatable.  While I'm sure my ideas will not make everyone
happy, I think there might be a compromise position hidden in the
details.

To that end, this is a very long, but I hope comprehensive look at
the situation.  This is not a proposal, and is not in the proposal
process at this point.  Indeed, for many reasons I don't know how
to make it a proposal right now.  To make sense this document
contains several proposals, and repeals several existing proposals,
and that all must happen as a package or not at all to make sense.
The current policy process is not friendly to these sort of multi-part
proposals, indeed they are often shot down due to a minor flaw in
a single part.  Rather, for now I would like to see if the various
parties do see this as a sort of acceptable middle ground.  Please
think on this before responding.  I attempted to look at the problem
individuals were trying to solve, not their proposed solution.
While I may not have solved your problem your way, I hope that I
have solved it in a way you can accept.

Note, this is NOT a current proposal.  It is NOT on the agenda for 
Vancouver.  I post this now to get feedback, and so I can talk to 
people at Vancouver about these ideas.  

Enough intro, to the meat of the matter...

There is a lack of consensus on the purpose of the "Whois" database.
Indeed, it has lead a schizophrenic existence.  Starting as the
Internet White Pages, moving to the single point of all Domain/IP/ASN
Info (at InterNIC) to today when "whois" properly is no more than
a protocol to get at various databases, one of which is run by ARIN.

Indeed, let's start with this most basic fact.  It's not the "Whois"
database.  It's a database of ARIN public information that's available
via the WHOIS protocol, also available via a web interface
(http://ws.arin.net/cgi-bin/whois.pl), and in bulk form via FTP
(http://www.arin.net/library/agreements/bulkwhois.pdf).  This is a gross
overloading of terms.  I'll start with a new term:

      ARIN PUBLIC INFORMATION DATABASE

      The ARIN Public Information Database (APID) is a collection
      of information created and collected by ARIN during the due
      course of business which the ARIN membership has deemed public
      information and decided to publish.

There is a corresponding database that also needs to be defined.  ARIN
collects other information in the course of business that do not get
published.  Examples include Network Maps from companies, their credit
card information for billing, and internal contacts not listed in the
public database.  I'll give that a definition as well.

      ARIN CONFIDENTIAL INFORMATION DATABASE

      The ARIN Confidential Information Database (ACID) is a collection
      of information created and collected by ARIN during the due course
      of business which the ARIN membership has deemed is confidential
      information that should be kept under a strict privacy policy.

{Note, a privacy policy needs to be defined for ACID, but I believe that
 is outside the scope of this particular document as it focuses on the
 public information.}

Now that we have precise terms for the groups of data, a statement
can be made about how that data should be published:

      ARIN shall publish the APID in the following methods using
      industry standard practices:

          - Via the WHOIS protocol.
          - Via a query form accessible via the HTTP protocol.
          - Via FTP to users who complete the bulk data form.
          - Via CDROM to users who complete the bulk data form.

      All data provided shall be subject to an AUP.  The AUP shall
      be written by ARIN staff/legal and posted on the ARIN website.
      ARIN may require a signed copy of the AUP before providing
      bulk data.

So far the definitions have been easy, now the harder parts.  What
goes in the public database, and what uses of that information are
supported.  I'll tackle the latter item first, as it impacts what
goes in the database.

Clearly the first use of this data is to support ARIN's business
of allocating numbers.  ARIN must collect data to implement the
address allocation policies as outlined by the community.  ARIN
then creates data as various items are issued to the ARIN user base.

The second use of data is nearly as important though, and that is
one of community verification.  The community uses the APID, and
other sources, to verify that entities are using the resources they
are supposed to be using.  Since there is no "police force" this
sort of community policing is absolutely required.  Note that today
in the case of RWHOIS the community may have to query a server run
by someone other than ARIN to get some information.

Community verification has two aspects to it.  First, the community
may want a third party to provide some verification that resources
have been allocated.  Second the community may want to verify that
ARIN is doing its job properly.  Unfortunately there is a problem
with the second use.  ARIN may use information in the ACID database
(including but not limited to network maps, business plans, and
customer network information) during the course of business.  Since
these items are not available in the APID it would be impossible
for someone to audit ARIN's track record based on the APID alone.

Due to the fact that it would be impossible to audit ARIN based on
the APID data, and due to the fact that I know of no attempts to ever
audit ARIN (or any other RIR based on that data) it would seem that
item is of low importance.  Indeed, were an audit to be necessary
an outside party would most likely have to come in with NDA access
to the ACID database for it to be properly performed.

The third use of data is in statistics gathering.  First order
statistics (how much of resource X is in use, how fast is it being
used) are necessary for the community to determine policies for
distributing resources, and to determine when resource pools are
exhausted and new solutions must be found.  The second order
statistics are generally used by third party research firms who
perform some level of data mining on the APID to produce statics
like the relative uses in different geographic areas, market segments,
and other groupings that don't directly show up in the APID.

The fourth use of data is as a contact database.  For operational
and abuse reasons it may be necessary to find a contact e-mail
address, phone number, or mailing address for the person or entity
who has been assigned space.  The question that arises with contact
information is what level of information is required, however that
will be addressed with what information goes into the APID, not
with the general categories of data.

In addition to the supported uses, there are some uses that are
expressly prohibited:

Contact information from the APID should not be used to send
unsolicited commercial e-mail, postal mail, or via any other method
of delivery advertising a product or service should be prohibited.

No information from the APID should be used to violate any state,
federal, or local law.

This leaves a potentially huge grey area of uses of the APID that
are not on the supported or prohibited list.  All of these uses
should be allowed, but ARIN makes no assurances that the data
in the APID will be fit for any of those purposes, or that the
data in the APID may be changed at a later time in a way which
may have adverse affects on these unsupported applications.  Any
users who have a new application they feel should be supported
need to have that application approved by the membership to be
added to the supported application list to ensure it will be
considered with future proposals.

Finally, to offer the above in proposal form:

      ARIN shall make the APID available for the following uses
      (supported uses):

        1 ARIN's use in implementing ARIN policies and other
          business.
        2 Community verification, allowing members of the community
          to confirm the proper users of the various resources ARIN
          controls.
        3 Statistic gathering by ARIN and third parties on resource
          utilization.
        4 As a contact database to facilitate communication with the
          person or entity responsible for a particular resource.

      ARIN prohibits the use of the APID for the following uses:

        1 Sending any unsolicited commercial correspondence advertising
          a product or service to any address (physical or electronic)
          listed in the APID.
        2 Using data in the APID to facilitate violating any state,
          federal, or local law.

      ARIN shall allow all non-prohibited uses of the APID, however
      unless those uses are listed as a supported use the data set
      may be changed in such a way as to render them ineffective,
      or they may be blocked outright as deemed necessary by ARIN
      staff.  Users of applications not listed who are concerned
      that they are supported should introduce a proposal to add
      their application to the supported list.

Now, last but not least, since we know what are the supported uses
of the APID we can look at what information is required to be in the
APID to support each point.  Clearly the data set that needs to be
supported in total is the union of all of the individual data sets.

Supported use #1 - No data needs to be listed in the APID to support
  ARIN's implementation of policies.  All data could be listed in the
  ACID for these purposes.

Supported use #2 - For community verification all resources managed
  by ARIN need to be listed, along with contact information.  A
  mechanism should be supplied to allow the delegation by resource
  holders for subsets of the resource where desired by the holder.
  For the purposes of verification delegation of subsets is completely
  optional.

Supported use #3 - Statistics are one of the biggest problems.  First
  order statistics are generally published by ARIN outside of the APID
  or ACID, and there is little need for that data to be replicated by
  outside parties.  Of more interest in second order data.  In order
  to get the most useful second order statistics the groups gathering
  that data would like the largest amount of information present in
  the database.  Aside from the ARIN membership using some of the
  statistics presented from third parties, there has been no consensus
  to date on ARIN funding or making available specific data for those
  uses.

  As a result, while statistics should be a supported use, they should
  not that this time influence what is, or is not in the APID.

  ARIN may want to develop a program such that under NDA and other
  controls approved organizations can access portions of the ACID
  database in order to get additional data.

Supported use #4 - For use as a contact database, all resources
  managed by ARIN need to be listed, along with contact information.
  In addition, policies and procedures need to be in place to keep
  that contact information up to date.  Realizing that contact information
  is a management burden, ARIN should support the minimum set of data
  that allows for generally accepted methods of communication.

  To that end, ARIN shall list e-mail, phone, and postal contacts
  for all direct resource delegations, and have a procedure to
  periodically verify that information.  The contact must verify
  that they are responsible for the resource in question, and that
  the contacts listed are the correct ones for dealing with any
  issues resulting from the use of that resource.  A mechanism
  should be supplied to allow for the delegation of a subset of a
  resource along with contact information.  Delegated contact
  information shall be verified using the same procedure as direct
  delegations.  These subdelegation should be marked as able to
  be further sub-delegated or not.

A sticky point in e-mail, but much more so in some of the public
policy meetings is that there is not agreement on the valid uses
for APID data.  One one end, some want APID to revert back to it's
earlier behavior of being a sort of white pages for the Internet.
The opposite view is that only the highest level of information
should be in the database, and that any further information should
be sought from the organization listed in the database.

The first thing to do is look for precedent.  While the initial
database was all encompassing, as early as 1994 that started to
change in policy (and possibly a bit earlier in practice).  An
interesting case study here is the similar-but-different case of
DNS names.  If "harvard.edu" delegates "cs.harvard.edu" to the CS
Department for use there is no requirement DNS Registrars be notified.
There is no protocol requirement to put contact information (or
anything besides basic nameserver IP's) in the protocol.  There is
the ability in the protocol (via RP records, and TXT records) to
list that information.  However, if that information is not listed
and you want to know something about "cs.harvard.edu" you must back
up one level in the tree to "harvard.edu", contact them and hope
they will pass you along to the right people.

Perhaps more interesting from a practical point of view is that a
smaller amount of valid contact information is generally much more
useful than a larger set of invalid data.  In addition, there is a
cost to maintaining valid data, in that ARIN must expend resources
keeping the data set up to date.  All of these point to a minimal,
but highly verified and accurate data set as being the cheapest
solution that also provides the highest probability of getting a
valid contact.

As a last point on this subject, there are cases where the entity
that has been given use of a resource, and the person who should
be listed as a contact for a resource are different.  For instance,
managed services companies often allocate resources for their
customers, but are tasked with answering all external queries about
those resources, and either answering them directly for their client
or forwarding them to their client as appropriate.  The system
adopted should support these situations, has having the proper
contact in these cases is far more likely to lead to useful information
than having the actual user of the space listed.

Finally, this leads to the policy of what should be in the APID:

     ARIN shall publish verified contact information and the
     resource(s) allocated (including identification for that
     allocation, like date of allocation or other information
     identified by ARIN) in the APID in the following cases:

         - All resources delegated by ARIN.
         - If allowed by the parent delegation, and requested by the
           contact listed with the parent, a subdelegation of a resource
           originally delegated by ARIN.

     ARIN shall insure all contact information in the APID is
     verified from time to time and is correct to the best of ARIN's
     ability.

To comment on the implementation.  I suspect SWIP as it exists today
would remain, simply having a two flags added to the template.
"Make public", and "allow downstream to SWIP".   Both would default
to no, making all information appear only in the ACID.  Sites using
rwhois could choose to restrict access to their rwhois server to
ARIN staff netblocks only if they do not whish to make their information
public.

Last but not least, some teeth are needed.  Without them companies
could simply not comply with the policies.  As such a mechanism
is needed to punish those who do not comply.

There are two ways ARIN may find non-compliance.  First, ARIN may be
unable to verify contact information during the verification process.
In this case the resource should be put in a suspended state, and
the parent for that record should be contacted.  In the event there
is no response for repeated attempts after the resource has been 
suspended the resource shall be revoked.

The second method is that ARIN may be notified that someone is
unable to contact an entity.  The entity reporting the problem must
show proof of attempting to make contact via two different methods.
ARIN should then follow the standard contact verification procedure
to verify the contact, and if verified seek explanation as to why
there was no response.

ARIN may set a threshold after which repeated reports by third
parties will result in suspension, even if verification succeeds.
ARIN may also set a threshold after which it can ignore notices
from those sending incomplete reports, or reporting organizations
which can document responses.


The proposal:

     If ARIN is unable to verify contact information via the normal
     verification procedure ARIN shall attempt to notify the parent
     of the resource to have the information updated.  If there is
     no parent, or if the data is not corrected in a reasonable
     amount of time the resource shall be SUSPENDED.

     Once the resource is suspended ARIN shall make one more request
     of all contacts listed with the resource and the parent resource
     (if available), and if no response is received in a reasonable
     amount of time the resource shall be reclaimed.

     Third parties may report the inability to make contact with a
     party via information in the APID.  In this case ARIN shall
     attempt the contact verification procedure for that contact
     immediately.  If a response is received, ARIN should document
     that a problem occured, and the responce from the resource
     holder.  Offenders who fail to repond to third parties more
     than 4 times per month for three months may have their resources
     reclaimed at the discression of ARIN staff.

     If a third party submits reports of the inability to make contact
     that are subsequently disproven, ARIN may choose to ignore reports
     from specific companies, people, e-mail addresses, or any other
     classification means as appropriate.

     The ARIN staff shall publish the time thresholds and procedural
     details to implement this policy on the ARIN web site.

     If a resource is reclaimed under no circumstances shall the
     holder of that resource be entitled to a refund of any fees.

The proposals listed above overlap with some other proposals already
passed and in the process.  They are enumerated below, along with
the reasons they should be abandoned or repealed.

* 2002-3 - Residential Customer Privacy.

  This policy should be repealed, as under my proposal there is no
  requirement to list residental customers at all.  These policies
  do not conflict, but do not make much sense together.

* 2002-4 - Bulk Copies of ARIN's whois

  This policy should be repealed, as under my proposal there is a
  definition of when an AUP is needed.  These two policies potentially
  conflict.

* 2002-8 - Privatizing POC Information

  This policy should be repeaed.  Under my proposal a company may
  make available contacts that are only listed in the ACID.  My
  proposal is also flexable in that ARIN staff can allow many more
  contacts than are on the existing template into the ACID as necessary
  without them appering in the APID.  Indeed, I expect ARIN staff to
  encourage users (via new forms and methods) to list role accounts in
  the APID, while providing individuals for the ACID, which should
  make some interactions much easier for the ARIN staff.

* 2003-9 - Whois Acceptable Use Policy (AUP)

  This policy should be abandoned.  My proposal requires an AUP as this
  proposal does, but leaves it up to ARIN staff and legal to both write
  the policy and keep it up to date.  ARIN staff may want to use this
  proposal as a template.

* 2003-16 POC Verification

  This policy should be abandoned.  My proposal includes POC
  verification,  and allows ARIN staff to define the procedure and
  publish it on the ARIN web site.  ARIN staff may want to use this
  proposal as a template.

* 2003-5 Distributed Information Server Use Requirements

  This policy should be MODIFIED.  This proposal covers many items,
  most of which are covered in my proposal, however there is one
  important item which is not.  This policy requires rwhois servers
  to be available 24x7 to the public.  This is an important proposal,
  and should continue on that point alone.  If an entity is going
  to publish information via rwhois they must take commercially
  reasonable actions to make it available 24x7.  Adding that
  requirement to my proposals would put it in the wrong place.

* 2003-11 Purpose and Scope of WHOIS Directory

  Already abandoned.

* 2001-7 Bulk ARIN Whois Data

  Already obsolete.


Last, but by no means least a look at the impact of these changes.
First, it is expected there will be no substantial differences in
the information collected by ARIN.  ARIN will still need the same
basic data, and will probably require the same level of detail in
the form of SWIP or RWHOIS.  The major change is that most of that
data will now reside in the ACID, and not the APID.  Generally this
should allow ARIN more freedom in the internal representation of
this data, and should also allow for enhanced privacy.  This privacy
comes in two forms, individuals and corporations do not have to
have their information published directly as long as they make
arrangements with someone higher in the heirachy to be responsible
for external queries, and it also allows businesses to keep more
of their network data private protecting business plans and such.

At the same time privacy increases it also increases the ability
to use this data for various purposes.  Since contact information
is verified, and must take responsibility for the space under policy
there is a much higher likelihood that the contact information will
lead to someone who can help, or who can respond to escalation
(eg, legal proceedings).  No longer will contact information be
listed for people who have no intention of responding, nor will it
become stale.  Further, if an abuse of the system is detected ARIN
will have the ability by policy to terminate that user.

-- 
       Leo Bicknell - bicknell at ufp.org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/
Read TMBG List - tmbg-list-request at tmbg.org, www.tmbg.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <https://lists.arin.net/pipermail/arin-ppml/attachments/20040412/a1de741d/attachment.sig>


More information about the ARIN-PPML mailing list