Nimrod

A New Routing and Addressing Architecture for the Internet

J. Noel Chiappa

"In anything at all, perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away..."
	-- Antoine de Saint Exupery, "Wind, Sand and Stars"
		(quoted in "Multics: The First Seven Years",
		 F. J. Corbato, J. H. Saltzer)
"One ring to rule them all, one ring to find them, one ring to bring them all, and in the darkness bind them."
	-- J. R. R. Tolkien, "Lord of the Rings"

Introduction

What is Nimrod about?

Nimrod is a routing architecture, not a routing protocol; done by a system architect, not an algorithmic expert.

In other words, don't expect to see:

New path-finding (routing) algorithms
New abstraction algorithms

Expect rather to see:

Ideas on what the overall functionality of routing ought to be
Ideas on how routing ought to interact with the rest of the internetwork layer
Ideas on how to organize the internals of the routing functionality

Internet Evolutionary Needs

The Internet was not designed to grow to its current size or importance. The Internet desperately needs several things:

A new addressing architecture
Recognition and explicit naming of some new fundamental abstractions
A much improved routing architecture
A resource allocation architecture
A security architecture
Accounting

Routing and Addressing Architectures

What are "routing and addressing architectures"?

An addressing architecture includes:

A system of naming networks, interfaces, topology aggregates, etc

A routing architecture is not simply a protocol (i.e. a set of packet formats, and instructions on how to process them). Rather, it considers the entire subject of how the network organizes the handling of user traffic.

A routing architecture includes:

A way of exchanging information as to where these named things are
A way of computing and selecting paths between sets of communicating entities
A way of causing traffic to take these paths

Examples and Non-Examples of Routing Architectures

Routing protocols:

BGP is a routing protocol
iBGP is a routing protocol
OSPF is a routing protocol
RIP is a routing protocol

Routing architectures:

Nimrod is a routing architecture
The entire collection of BGP, etc, along with the specifications for how they interact, is a routing architecture (albeit an incredibly poorly organized and lashed-together one, not to mention one without many desirable capabilities)

Architectural Discussion Fundamentals

Confusion in Terminology and Concepts

I am far from thinking that nomenclature is a remedy for every defect in art or science: still I cannot but feel that confusion of terms generally springs from, and always leads to, confusion of ideas.
	-- John Louis Petit, Architectural Studies in France, 1854

Much of the confusion in discussion of routing results from varying meanings for terms, particularly address. This happened, in part, because the set of fundamental objects recognized was not rich enough.

Thus, difficulties have arisen when solutions to various problems demanded that objects which were previously hidden "inside" or "behind" other objects be recognized as separate entities.

Names and Objects

We must also separate out the name for a thing from the object, the thing itself, to which those names refer. We must also separate out the form of the name for a thing from the generic concept of a name for the thing.

One of more possible names or types of names can exist for each object, and different names may or may not have structure. The structure is there to help something else do its job.

Objects

Interface:
- A place to which a producer or consumer of packets connects to the network; a network attachment point
- The place towards which the system of routers directs packets
Network:
- A collection of interfaces which have some useful relationship:
  - Any interface can send directly to any other without going through a router
  - A topology aggregate
Route or Path:
- A path from one place in the network to another

Host:
- An actual machine which is the source or destination of traffic, through some interface
Router:
- A device which is interconnecting various elements of the network, and forwarding traffic
Node:
- A host or router
- In Nimrod technical terminology, an element of a Nimrod map (which is represented as a graph, with nodes and arcs)

Names

Address:
- A structured name for a network interface or topology aggregate:
  - The structure is used by the routing to help it scale
  - Topologically related items have to be given related addresses
  Topologically related addresses also:
  - Allow the number of destinations tracked by the routing to be minimized
  - Allow quick location of the named interface on a map
  - Provide a representation for topology distribution process
  - Provide a framework for the abstraction process
DNS names:
- A structured human usable name for a host, etc
- The structure is used to facilitate the distribution and lookup

New Terminology and Concepts

A new routing architecture should add a new object, and a new naming space for them, to the packet layer: the endpoint.

An endpoint is:

Approximately, the entity at one end of a TCP connection
A fate-sharing region
An end-end entity
Can be hosts, mobile processes, etc...

Routing Fundamentals

Classes of Routing Architectures

In one of the two fundamental divisions of routing architectures, paths can be calculated either:

By each node along the route in turn, called hop-by-hop; i.e. each node makes a wholly independent decision as to which next hop a given packet will take.
In a localized way, by the source or an agent thereof), often called source-routing, but also referred to as explicit or unitary.

(Note: these latter terms, explicit and unitary, are to be preferred to source routing, as this is often taken to have the limited meaning that the host sending the packet must compute the entire route, which is then included in every packet.)

There is also a complex spectrum of possibilities between these endpoints.

Along another axis, there are two broad classes of routing algorithms:

Destination Vector (DV) algorithms, where nodes pass sets of (destination, information) n-tuples to immediate neighbours, and this information is used to construct routing tables. They are characterized by:
- The computation of the route from a given source to a given destination is evenly distributed; i.e. at any given level of abstraction, no node has a more significant responsibility in selecting the path
- Intermediate results in that computation are passed between independent nodes
Map Distribution (MD) algorithms, where nodes flood information about their immediate surroundings through the network; this information is used to construct topology maps. They are characterized by:
- The computation of the route can be either localized or hop-by-hop, but in either case it is not fully distributed
- No intermediate results are passed around
- The routing computation may easily be delayed until the path is needed

Advantages and Disadvantages of Various Routing Architectures

Hop-by-hop architectures have the following disdvantages:

They effectively require global consistency in the routing databases to prevent routing loops.

Explicit architectures have the following advantages:

They trivially allow the testing and deployment of new path-selection algorithms
They can allow the users much more control over the path of their traffic
They are more naturally immune to loops
They can be made much more robust
They can be secured against almost all attacks

DV architectures have the following advantages:

They require less computing and storage overhead than MD

MD architectures have the following advantages:

They are much more practical to secure against many attacks
They can react more quickly to changes (in topology, etc)
Their stabilization time is not only shorter on average, but also much more bounded
They require less bandwidth to react to significant changes in network topology
Policy routing (which includes most QOS routing) is only practical with MD architectures
Highly secure routing is only practical with MD architectures

The Current Internet Routing Architecture

The current Internet routing architecture has the following key characteristics:

Destination Vector type
Hop-by-hop routed

Do not be misled by the deployment of IGP's which use MD architectures, such as OSPF and IS-IS, in limited areas of the network. The overall operation of the current Internet routing architecture is DV:

The data passed between AS's consists of routing tables, not maps
The path across a number of AS's is selected piecemeal, not in a unitary fashion

Tradeoffs in Large Scale Routing

In a large network, "good routes" are not a simple, fixed goal. Rather, there is a complex tradeoff at work:

In any large network, you have to discard detailed routing information, or incur the overhead cost of massive amounts of potentially unneeded detail.
This process is called abstraction.
This process can be divided into two classes:
- Compression, in which the same routing decision is made in all cases after the abstraction as before
- Thinning, in which the routing is affected
In real networks where abstraction is needed, thinning is generally required, particularly in an environment rich in policies and link attributes (e.g. bandwidth, delay, etc).
If the routing was previously optimal, discarding routing information via thinning means non-optimal routes, i.e. costs for traffic taking extra hops, or failure to find a policy acceptable route which does exist but is not visible, etc.
Routing in large networks thus incurs two kinds of overhead cost:
- The cost of running the routing
- The cost of non-optimal routes
The challenge of routing in very large networks is managing this choice of costs.
This choice is not a purely technical issue, but rather a cost/benfit policy issue, so it is a trade-off we must to some degree leave open in the design.

Addresses, Topology and Routing

There are subtle connections between the topology of a network (i.e. how it is connected together), the abstraction hierarchy (i.e. hierarchical addressing structure) used to create addresses for it, and the routes that are generated:

An abstraction hierarchy is loosely isomorphic to a topology.
The abstraction hierarchy is chosen to minimize the sum of the two costs of routing overhead (i.e. direct costs, and non-optimal path costs), for a given topology.
The abstraction hierarchy chosen does have subtle implications on which paths are discovered, and thus, used
As the topology changes over time, the cost (i.e. the sum of the two costs) is no longer minimal.
In the long run, you have to change addresses. This should be dynamic, and semi-automatic.

Nimrod Design Goals and Principles

Technical Goals for Nimrod

Nimrod has an extensive list of goals, but this one tops the list:

Maximize the lifetime of the architecture

This is by far and away the most important goal. To understand why, let's look at why large-scale communication networks - and the design philosophy that must be used in designing them - are inherently fundamentally different from all other large-scale software systems.

Design Philosophy for Large-Scale Communication Networks

The design philosophy used for these systems is inherently different from that used in other large software systems.
The need to communicate means that simultaneous use of alternative solutions is not really viable, as it is in all other software systems: e.g. operating systems, editors, languages, etc.
This need to communicate means that if adding new capabilities means switching to a significantly different architecture, change will very difficult, as upward compatability, which is limiting, must always be observed.
The sheer scale, and difficulty, of change in such systems means that major changes are very expensive and painful.
Thus, to minimize costs over the entire life-cycle of an large-scale communication network architecture, maximizing the life-time of the architecture is the key goal.

Devolution

As fate-sharing was perhaps the key concept behind the last generation of networking architecture, the key concept of the next generation of networking architecture will probably be devolution.

Devolution, simply put, means that we will never do in the core of the network anything we can move to the edge. In other words, the network won't try and make decisions for the users, but rather will hand the responsibility over to the users.

This is related to the concept of separating policy from mechanism.

The relocation of mechanism, from the network, into the hosts, is a key part of maximizing the lifetime of the architecture. The less the network does, the easier it will be to change and upgrade functionality.

Architectural Lifetimes and Engineering Viability

There is an inherent tension between producing an architecture with a useful lifetime (and how often have we seen crippling costs when an architecture runs into its limits), and producing an architecture which is viable, in engineering terms, at the time it is first deployed.

Overhead costs, etc, which seem trivial toward the end of the lifetime of an architecture (typically twenty years, or more) seem infeasible at the time the architecture is deployed.

However, without the boldness to pay high costs up front, to allow adoption of an architecture with more room for growth over the long run, down the line the architecture will be crippled by limitations imposed in order to produce engineering savings with only short utility.

There is no easy solution to this problem.

Additional Technical Goals for Nimrod

Nimrod has many other goals, which are ancillary to simply maximizing the lifetime:

An incrementally/interoperably deployable system - without this, it cannot be deployed.
Support efficient datagram mode service; i.e. the ability of a user to send packets without requiring any prior network setup and at low overhead.
Scale indefinitely in terms of number of networks, routers, etc. The expected worldwide network system size is so large that optimizations based on a fixed maximum size aren't worth the limitations.
Provide a very robust system, one which is proof against all kinds of failures as well as attacks.
Provide flexible firewalls which allows the users to enforce as much security on the routing as they need.
Support policy routing. Policy does not mean just restrictions on traffic; it includes all routing decisions (such as QOS) not made on a simple metric or simple combination of a few metrics. Policies can be divided into the following classes:
- Restrictions on which traffic can use particular transmission resources (also called transit policies, or access control).
- Restrictions on which resources can be used by particular traffic (also called source policies or the trust model).
- Restrictions on who can find out the details about certain parts of the network (also called information hiding)
An alternative term to policy routing which is used a lot now is constraint-based routing.
Minimize the hand-configuration required of users, since this is a source of potential errors, as well as being tedious and complex.

Design Principles for Nimrod

Do the best possible design, and then work out how to deploy it; don't simply imcrementally improve the existing system.
Maximize the lifetime by:
- Making the design as flexible as possible
- Minimizing the mandatory, system wide part of the architecture
Maximize robustness by:
- Reducing the dependencies among various components
- Use of redundant mechanisms where plausible
Break the routing up into a number of subsystems, with the interfaces between them visible, and available, to the users.

Nimrod Architecture Overview

Nimrod Fundamental Architecture

The Nimrod routing architecture has the following key characteristics:

Map Distribution type
Explicitly routed

This combination has a number of advantages:

Allows maximal overall flexibility (i.e. system lifetime)
Is securable against explicit attack
Allows maximal robustness
Interacts well with flexible abstraction mechanisms
Allows policy routing (allows complete source control over packet forwarding path).
Allows maximally flexible policy routing
Interacts well with resource allocation systems

It also includes the following key concepts:

The fundamental mechanism for forwarding user data packets is the flow
Uses hierarchical abstraction to allow and manage scaling

Note: The path-selection algorithm is not part of the protocol (although a sample algorithm can be given in an appendix).

Additional Nimrod Architectural Points

The Nimrod routing architecture is also characterized by the following concepts:

Provision for efficient handling of datagram service mode packets, using a mesh of Datagram Mode Flows (DMF's) which are maintained at all times
Flexible abstraction techniques:
- Compression + Thinning ==> Abstraction
- Thinning is needed for practicality, but causes non-optimal routes
- Nimrod allows users to pick their own balance between these two costs
- Abstraction algorithms (outgoing and incoming) are not part of the spec
Complete routing system, down to the lowest levels, with no simplistic IGP/EGP split

Reflections on the Architecture

Note what is not part of the architecture:
- Path-selection algorithm
- Outgoing abstraction algorithm
- Incoming abstraction algorithm (Local Abstraction Control)
Key advantages of this approach:
- Limits amount of work to be done
- Allows future improvement
- Allows easy deployment of new algorithms
- Reduces scope for errors and bad design by minimizing global mechanisms
What is part of the architecture:
- Method of representing topology
- Method of distributing topology information
- Method of setting up flows, and handling datagram mode packets

Overview of Nimrod Mechanisms

Addresses

Nimrod uses, and allow use of, new addresses (something else the Internet needs), called locators, using a syntax and semantics which will hopefully never need to be reworked. Their characteristics are:

Variable length
Variable number of levels
Addresses can name topology aggregates as well as individual network elements
Addresses are not intended to descend from a fixed root, but will instead by built in a natural "bottom-up" style, as larger and larger addressing abstractions are defined.

Note that it is not intended that all packets will carry these addresses; packets which are using the flow service mode need only carry flow identifiers.

Maps

Connectivity information in the form of maps is the key data item in Nimrod.

Maps:

Are the data which routers pass around
Form the database which is used to selects paths

Nimrod maps consist of:

Nodes (which have an open-ended list of attributes)
Arcs (which are uni-directional links which connect nodes, and cannot have attributes; they simply indicate connectivity)

In general, it is not required that different routers have consistent maps.

More About Maps

A node in a map represents some part of the physical network. It may be as large or as small as desired: a node can represent an entire ISP, or a single interface.

A contiguous portion of the graph which represents the entire physical network may be represented by a node, or set of nodes, which are an abstraction for that part of the network.

Abstractions can be recursive.

The entities responsible for advertising the abstraction for an area of the physical network get to chose what abstraction for that area they wish to hand out.

Each node which is using maps to select paths can, within limits, decide for iself how much detail it wants to see in its map. If the information it gets by default is not detailed enough, it can go out and ask for more.

Attributes

The nodes in the map may be characterized by an open-ended (and locally extensible) list of attributes, which can include many different kinds of information. Attributes might include:

Bandwidth
Delay
Delay variance
Error rate
Cost
Allowed users (e.g. government only)

Each node also has some inherent attributes, i.e. ones which each node must have:

The locator of the node
The connectivity of the node

Path Selection

Conceptually, path selection is done by the users of the network, i.e. the hosts. However, many hosts will not want to deal with keeping maps, running path-selection algorithms, etc. It is thus expected that most hosts will get paths from route servers.

Route servers have the advantage that they form a central point at which an organization can express organizational policies with regard to path preferences through the network outside the organization.

Path selection across a large graph, in the presence of multiple constraints, is a difficult problem, and will probably be the subject of research for decades to come.

User Data Forwarding

The plan is for Nimrod to support four basic modes for handling user data traffic:

Flow mode, where the user does a prior flow setup (which is also part of Nimrod), which can include arbitrarily complex arrangements for resource allocation, security, billing, etc; packets then carry only flow-identifiers for doing forwarding.
Node Chain mode, where packets carry a list of nodes, and the packet is required to go through the nodes listed, which should define a continuous path across the network. (This is basically strict source route.)
Node Sequence mode, where packets carry a list of nodes, but the list does not define a continuous path across the network, merely points the packet has to travel through. (This is basically loose source route.)
Datagram mode, where every packet header carries source and destination locators. (This is basically normal IPv4-type forwarding.)

The first and last modes are intended to be the ones principally used; the others are for special situations, fault isolation, critical network operations, etc.

Datagram Mode

Flows are the fundamental entity in Nimrod, and no packet travels anywhere except in a flow. To provide Datagram mode service, Nimrod routers will assign a packet to a sequence of Datagram Mode Flows (DMF's), which are relatively short-distance flows which are set up specifically to handle Datagram mode packets.

The network is completely spanned by a mesh of DMF's. The packet traverses the network by being assigned to a sequence of DMF's, a sequence which is specifically selected to move the packet towards its eventual destination.

Datagram mode packet headers contain:

A locally usable path-id field
Source and destination locators
A pointer into the locators

Datagram Mode Operation

While in a DMF, the forwarding routers do not examine the locators, only the flow-id.

Only active routers (i.e., one that actually makes a decision about where to send the packet, after it has exited a DMF - as opposed to handling it as part of a flow) look at the locators.

At each active router, the router examines the locators in the header to see where to send the packet next, assigns the packet to the appropriate flow, and sends the packet off.

The pointer is initialized to point at the lowest level of the source locator. From there, it moves up that locator, then to the destination locator, and then down.

While going up the source locator, each active router selects a DMF which will take the packet to the next higher level object in the source locator, and then advances the pointer.

The process repeats, until the packet reaches a router which represents an abstraction which is at the least common intersection of the two locators. (E.g., for A.P.Q.R and A.X.Y.Z, this would be when the packet reaches A).

The process then inverts, with each active router selecting a DMF which takes the packet to the next lower object in the destination locator. So, A would select a flow to A.X, and once it got to A.X, A.X would select a flow to A.X.Y, etc.

Datagram Mode Flows

All routers have to contain a minimal set of pre-setup Datagram Mode flows (or be prepared to set these flows on demand) to certain routers, which are at critical places in the abstraction hierarchy.

These flows are used to carry Datagram mode packets through the network. It is purely a local decision which, if any, of those flows to set up before there is an actual traffic requirement for them.

There is a minimum set of flows which do have to be able to be set up for the system to operate, however.

Datagram Mode Flow Optimization

If the system keeps more than the minimal set of DMF's, and the router allows longest-match lookups (e.g., in much the same way as the current routing table for hop-by-hop datagrams), routing can be more efficient. I.e. packets will no longer travel along strictly hierarchical paths, but will "short-cut" closer to ultimate destinations. (When this happens, the pointer will advance more than one level at a time in the locators.)

Traffic monitoring and analysis (again, using purely local algorithms) can result in a database being created over time, which shows which DMF's above and beyond the minimal set are worth keeping around.

This traffic monitoring could also show which flows from the required minimal set of DMF's it would be useful to set up in advance of actual traffic which needed them.

Note that all these sets can be changed in a local, incremental way, without disturbing the operation of the system as a whole.

Datagram Mode Efficiency and Robustness

The forwarding of Datagram mode packets can be quite efficient (possibly more so than even standard hop-by-hop):

In the non-active routers, the packet is associated with a flow.
In active routers, the process of looking up the next DMF would be about as expensive as the current routing table lookup.

It can easily be seen that the process guarantees that the resulting path is loop-free:

Each flow selected must necessarily get the packet closer to its destination.
The flows themselves are guaranteed not to loop when their paths are selected, prior to being set up.

Multicast Support

Nimrod approaches multicast with the same ideas used elsewhere: try and break the problem up into pieces, and put as much of the functionality as possible outside the architecture, to allow flexibility in algorithms, etc.

This last is especially important for multicast, where group sizes can vary by many orders of magnitude.

Nimrod separates several distinct phases of creating a multicast group:

Determining membership
Deciding what kind of data distribution you want (per source, or whatever)
Calculating one or more spanning trees which connect the members
Installing the state about those spanning trees in the routers
Forwarding user data

Nimrod provides some very fundamental multicast building block(s), such as multicast flow setup. All the rest (like the mechanisms to maintain the state about group membership, calculate the spanning tree, etc) are outside the core architecture. Users can select locally whichever algorithm makes sense in their particular application.

This has all the advantages of the Nimrod approach to unicast:

Makes it a simpler design
Makes it less likely there will be something wrong
Allow experiments with new algorithms
Allows incremental deployment of new algorithms

Nimrod will also distinguish between a multicast group (i.e. a name for the set of members) and a particular multicast flow (i.e. a particular data distribution channel to that group). There can be multiple flows which go to the same group, but controlled independently.

Flow Aggregation

Flow aggregation:

Provides no extra capability to the users
Adds complexity

So, why add it?

Because it is needed to allow a positive economy of scale in high-speed switches, since the unit (i.e. per-flow) cost of per-flow memory is higher there.
Flow aggregation also has the nice property that it works well with virtual links.

Note that you don't have to have a lot of aggregation, just enough to keep the unit cost of per-flow state memory declining at an appropriate rate.