Network Working Group                                     I. van Beijnum
Internet-Draft                                             March 4, 2007
Expires: September 4, 2007


                          A mapping protocol
                    draft-van-beijnum-mapper-00.txt


Introduction

This document describes protocol that distributes mapping information.
The protocol takes a namespace with no explicit hierarchy and allows
data to be attached to any name in the namespace, providing the ability
to query for the data attached to a given name, and be automatically
informed of subsequent changes to this data.

The purpose of this protocol is to distribute the location of
decapsulation devices that are capable of receiving encapsulated IP
packets, decapsulate them and deliver them to the holder of the
destination IP address without the need for a prefix covering this IP
address to be present in the BGP default-free zone.

General overview

The idea is that users of IP address space are connected to one or more
ISPs (the examples will use two) in the following fashion:

         ISP A --- ISP B
       /       \ /       \
User 1          X          User 2
       \       / \       /
         ISP C --- ISP D

where each ISP operates a network with routers that have one (sometimes
more) of three functions:

1. Edge router: this type of router connects to customers
2. Border router: this type of router connects to other ISPs
3. Core router: these provide connectivity between edge and border routers

Edge routers don't necessarily need a copy of the full default-free BGP
routing table; they can forward packets according to a default route if
need be. Also, in most cases, they don't need to be extremely fast
because the majority of customers connects at modest speeds.

Border routers don't need a copy of the full default-free table either;
they only need routes for the destinations connected to the external
networks they interface with. Border routers may need to be fast because
a lot of traffic is exchanged between ISPs.

Core routers need a full BGP table and they need to be fast: all the
routing information and all the traffic comes together in these routers.

The idea is to implement encapsulation and decapsulation devices close
to the edge routers, where traffic volumes are relatively modest.
Certain IP address prefixes are then no longer carried in BGP in the
core or border routers. Rather, packets with these destinations are
forwarded to an encapsulation device, which encapsulates the packets in
some fashion and sends the packets on their way as per the encapsulating
protocol and/or the address in the encapsulating header. The examples
will assume IP-in-IP encapsulation, but other types are possible as
well, such as direct / on demand fiber paths or MPLS. When the packets
arrive at the decapsulation device close to the destination, they're
decapsulated and forwarded to the destination using the destination IP
address in the original IP packet. The routers between the
encapsulation device and the decapsulation device now no longer need
to have the IP address prefixes in question in their tables.

The mapping protocol makes sure that encapsulation devices know which
decapsulation devices serve a particular IP destination.

Mapping protocol overview

In addition to decapsulation devices that generate mapping information,
and encapsulation devices that use mapping information, there are also
authoritative mapping servers. The idea is that parts of the
name/address space are served by a small number of authoritative
servers. Decapsulation devices register their mappings with these
servers, and encapsulation devices query the authoritative servers. A
final type of mapping servers are the ones that only consolidate, cache
and redistribute mapping information.

Mapping information distributed in the protocol consists of:

1. Source: an AS number, a router identifier and a timestamp
2. Mapping data: for IP-in-IP, this would be the tunnel destination
   address
3. Refer to: a set of addresses that may be contacted for more detailed
   information
4. Source preference: traffic engineering information from the source

This information is put into packets along with an indentifier, which is
the IP prefix that the mapping information applies to. On its way from
the decapsulation device to the authoritative server, the data is
accompanied by an HMAC so the authoritative server knows the mapping
data is authorized by the legitimate holder of the address space. A
shared secret is communicated out of band. Additional security
mechanisms may be defined at a later date.

Authoritative and non-authoritative mapping servers consolidate sets of
mapping data pertaining to the same identifier into a single message.
This is possible because only mapping data with the same source AS and
routing ID overwrites an earlier message. The timestamp is used to
suppress duplicate messages. Mapping servers may also attach local
preference information to aid traffic engineering.

Operation

In order to advertise mapping information, a decapsulation device must
set up and maintain a TCP session towards one or more authoritative
servers, possibly aided by intermediate non-authoritative servers.

Encapsulation devices may either set up and maintain sessions towards
authoritative servers (in practice probably to non-authoritative
caching/redistributing servers). As long as the session remains, the
encapsulating device may receive updates for the information it
requested earlier. Alternatively, they can do a single lookup and then
connect directly towards one or more decapsulation devices to learn
reachability information directly, and/or request information that is
more granular that what the mapping/flooding system can accommodate. An
example of this would be host mobility.

Discussion: for just PI without multihoming, it's enough to do a single
lookup for mapping info, as this information isn't going to change.
However, for multihoming, it's essential to be informed of changes very
quickly: with only caching, the encapsulation device could be sending to
a dead decapsulation device. An alternative approach would be to always
cache, and let a decapsulation device reach back to the encapsulation
device when the decapsulation device is no longer capable of delivering
packets to a certain destination. With IP-in-IP this is easily done
through the tunnel source address. However, this trades off reliability
of the packet delivery against scalability of the mapper service.

Note that both encapsulation and decapsulation devices can be
implemented inside ISP networks or in end-user networks, or even in
hosts.

Document and author information

This document expires September, 2007. The latest version will always be
available at http://www.muada.com/drafts/. Please direct questions and
comments to the author:

    Iljitsch van Beijnum

    Email: iljitsch@muada.com