Versionzero

ProtoGENI Version Zero

What needs to be done for "version zero"?

Shared nodes
- Run our own PLC.
- Run our own nodes with the PlanetLab image, and allow people to get slivers on them.
- Our PLC talks to our ELC (?).
- Status: David has made a lot of progress on this front - we can run our own PLC, with its own nodes, controlled by Emulab.
- Status: In preparations to ship the first set of widearea nodes
Tunnels
- Status: We have GRE tunnels working inside Emulab, but there are issues in applying them to remote shared nodes.
  - One possibility is to have multiple routing tables, where each slice gets its own. Downside: Requires kernel hacking or the VINI kernel, which is not stable enough yet (4/22/08). In particular, can't tear down vservers w/o crashing/hanging due to ref count problem. Mike has to focus on unrelated work for awhile, so next option is current bet.
  - Another possibility is to force everyone to use a separate experimental net IP spaces. Downside is that we might have to do kernel hacking to get multiple interfaces in a vserver, and might end up throwing away the code to do separate IP spaces eventually.
Two-level naming scheme
- Status: Version zero will stick to the Emulab project/experiment fixed-level hierarchy rather than the arbitrarily deep hierarchy we will eventually use.
Layer 2 devices, in a geographically distributed network
- Status: NetFPGAs are supported in and installed in Emulab. More work needs to be done to support them in shared Planetlab-like nodes.
Wide-area events
- Status: Most or all of the necessary work has been done, but needs reliability/shakedown testing.

(Also see the MinutesFrom09Nov2007.)

Choices for Node Software

Use VINI

What we need to do to incorporate "VINI kernel" for use on shared, wide-area proto GENI nodes.

First, let's be clear on what we want out of the VINI kernel:

Virtual nodes with virtual links on shared physical nodes in the wide area.
Physical nodes that host multiple virtual nodes.
Only one vnode per slice (experiment) per physical node. Note that this is different than what we allow in the local (cluster) case.
Virtual nodes may have multiple interfaces (end points of multiple virtual links). These interfaces look like ethernet interfaces (i.e., layer 2).
Each virtual node will have a "control interface", routed over the Internet. This interface is IPv4 (layer 3). Whether the address is shared with all other co-located virtual nodes and whether the port name space is shared is TBD, see below.
Multiple virtual links (from the same or different slices) may be multiplexed over one physical link.
Physical links may be layer 2 tunnel devices over IP (initially) or raw layer 2 devices (ultimately, on dedicated I2 waves).
Plab resource isolation.
Plab link monitoring.

Version Zero is (V0) just getting widearea proto GENI nodes into topologies, no GENI APIs need apply. For this we should be able to use the VINI setup (kernel + NM + utils) largely unchanged. With the exception of the control network, things should work for us.

How do we determine what resources are available on a V0 node? Largely, the same way we do for Plab. For CPU/memory/disk we use CoMon (or whatever it is we are currently using). For network BW for the experimental links, we likely do nothing.

Experimental interfaces are configured from inside vservers using vsys:

echo LABEL REMOTE KEY > /vsys/setup-link

(not sure of the /vsys "device" name) where label is something to identify the device, REMOTE is the IP of the other end, KEY is the key. Not sure how this is authenticated (i.e., what tunnels a vserver can create). This will create the tunnel on the outside, create a virtual interface and bridge it to the tunnel, and enable traffic to the interface. It also creates a couple of dynamic vsys interfaces for grabbing the link and for shutting down the tunnel. The former (/vsys/grab-etunN) will move the interface into the slice namespace (not sure why this is a separate interface) and the latter (/vsys/delete-etunN) tears down the link/bridge/tunnel. These should hook in easily to our client scripts.

The control network is the big problem. Some alternatives:

The VINI way. VINI is planning on using the vsys interface to allow processes to be moved in and out of the vserver's network namespace. While out of the namespace they have access to only the Internet, while inside they have access to only the topology. The interfaces are something like:
```
echo PID > /vsys/enter_admin	# get PID outside
echo PID > /vsys/enter_topo	# get PID inside
```
where PID is the pid of the process to move. But this would seem to be onerous for applications that want to monitor/control the topology from outside. I'm not even sure what the consequences are for a single process toggling back and forth between namespaces, since open sockets, etc. should disappear when you change and you would have file descriptors pointing to who knows what. So most likely there will be a process running inside talking via the filesystem to a process outside. I say filesystem because I am assuming that unix domain sockets are part of the network namespace and thus cannot be shared. Perhaps a fifo. Regardless, the model will be considerably different than what our node agents expect.
Tunnelling to Emulab. One way to possibly preserve more compatibility with the current Emulab Way, would be to use the VINI vsys interface in !#1 to also create a tunnel device back to boss/ops for the control net. This way we can assign each vserver its own virtual control net, with its own IP, in the same namespace as the topo. At the Emulab end of the tunnel we bridge them all into the vnode namespace (172.16). This means that people will have to jump through Emulab to get to the nodes, which is not convenient but may be the way of the future if/when the GENI control plane is no longer IP-based.
NAT. A final way is to give each vserver its own IP as in !#2 but to bridge them all together on the physical host and then run a NAT to map them to the single physical host IP. You essentially have all the same restrictions as in PlanetLab (i.e., limited port range). If we map each vserver to a fixed range of ports, then we can even allow incoming connections. Provisions can be made for using reserved ports as well. This approach means modifying the base VINI system, in order to setup/maintain the NAT configuration.

I ruminate on the "control net problem" further in ProtogeniNetworkAccess.

Use PlanetLab

If the VINI kernel is unstable, we can use plab kernel instead with certain restrictions:

One vnode per-experiment, per-phyiscal node. That is, avoid the need for seperate routing tables and the revisitation problem. This is the "PlanetLab Way" so not really a restriction.
Distinct experiments cannot use overlapping IP space on nodes so that we again avoid the need for multiple routing tables. This requires that we be in charge of IP assignment for at least all tunnel devices. See task !#1 below.

The tasks:

Enforce global allocation of experimental IP space. Probably just gnaw off a chunk of the 10. space for tunnel use and manage that. This would have less impact on IP assignment as it happens today and, if chosen carefully, would not interfere with user-assigned addresses past, present or future. Perhaps better would be to use part of the 172.16 space. We use 16-18 for vnodes, but the rest could be used. Downside is that we route this range internally so there is potential for confusion (something like: someone has a tunnel, sets up or deletes a route incorrectly causing 172.16 tunnel traffic to go out on our control net to other nodes).
Modify NM and plab kernel to allow/do creation of tunnel devices inside vservers. Probably everything is there, we just need to modify the NM to create and configure bridges before it sets up the vserver. The vserver can then be created with multiple interfaces: control and bridges. Note that we should be able to dispense with the virtual IFs and bridges that are in VINI, because there is no need to create a "tunnel" between namespaces and we can prevent the vserver from modifying the bridge setup once inside.