After a short hiatus imposed by a broken elbow, it’s high time for an update again. This time around, I want to focus on some basic design considerations that are derived from the requirements I explored in previous posts. The aim is to clarify some concepts, and as a result get a decent idea of the information we need to transmit in packet headers.
In the previous post, I wanted to start with UDP as the base on which to build. That idea remains, but I would like to expand on this a little. The reason I gave is that for UDP (IP, really) there already exists routing equipment all over the internet, which means that any protocol built on it should be routable almost everywhere. Also, it can be implemented as an application-level protocol, which means there will be fewer hurdles for widespread deployment.
Let’s briefly explore your typical Internet stack in order to get some basic routing concepts on the table, as they lie at the core of what we need to do.
As a stand-in for any lower-level transports, Ethernet works pretty well. Ethernet is mainly concerned with addressing physical hardware ports on connected machines. That is, Ethernet cards have a hardcoded MAC address, and Ethernet is concerned with transmitting data from an Ethernet card with one MAC address to another Ethernet card with a different MAC address.
The level of intelligence in Ethernet is best illustrated by the difference between Ethernet hubs and Ethernet switches. Ethernet hubs dumbly copy any data that arrives on one of their ports to all other ports, in the blind hope that the destination card is attached on one of them. By contrast, Ethernet switches understand which MAC addresses are to be addressed over which port, and copy packets selectively (hubs are rarely used these days).
But if you consider that hubs work for connecting Ethernet cards, it becomes clear that Ethernet is not overly concerned with network topology, approaching the complexity of a bus (though buses are usually differentiated from Ethernet by also providing power).
Ethernet frames always contain the source and destination MAC addresses to enable switch operation.
The IP layer introduces the concept of network masks. MAC addresses are assigned by the networking equipment manufacturer, and manufacturer prefixes ensure global uniqueness. By contrast, each IP address is considered to have some bits identifying a network, and some bits the machine within the network. The network mask identifies which bits belong to which part.
Protocols such as ARP are used to send translation information between MAC addresses and IP addresses.
IP routers are significantly smarter than Ethernet switches, in that they use additional routing information – either configured on the switch or transmitted via routing protocols – in order to understand how to reach IP networks they’re not directly connected to. The basic concept is that of a default route, an IP address to forward all packets to for which there is no better routing information available.
But exchanging routing information is not part of IP as such. On the packet-by-packet level, IP maps destination IP addresses to destination MAC addresses, and from there to ports, using whatever routing information is available locally. This makes IP an abstraction layer over Ethernet which is primarily concerned with network topologies – but it still addresses machines via their networking cards.
IP packets always contain the source and destination IP addresses.
Protocols such as UDP and TCP add a concept of ports. IP contains a protocol flag, which indicates whether it encapsulates e.g. UDP or TCP, then each of these provide additional headers.
Ports serve a dual purposes, depending on whether they’re used on the machine initiating or accepting a connection, and the mechanisms also depend on which of the two protocols we’re discussing.
In both protocols, ports on the destination machine are statically associated with server software. Server software listens on IP and port combinations for incoming connections. The port they choose is typically determined by the server protocol, such as port 80 for HTTP – but nothing in TCP absolutely prescribes this choice. Rather, it’s a function of the IANA Service Name and Transport Protocol Port Number Registry, and, well, convention arising from this.
The fixed port for protocols on the server side are required because otherwise without an extra information channel, clients could not know which server port to connect to in order to speak a particular protocol. So ports on the server side effectively specify services.
On the client side, ports identify user activities. A user may choose several different clients to speak HTTP to the same server for multiple different intents; for this reason, clients are often named user agents, as they act on behalf of a user for a particular purpose. As a consequence, ports on the client side must be largely ephemeral, assigned to user agents as needed.
In TCP, that’s exactly what tends to happen – TCP ports are assigned to the user agent by the operating system when sockets get connected. In UDP, connection-less sockets also permit the user agent to choose a local port. These ports are still ephemeral, as user agents will chose from unused ports.
In either case, source and destinations ports are sent with every packet. Amongst other things, this enables firewalls to work on port-based rules.
Multipath TCP can not be satisfied with ports alone. Rather, once an initial connection has been established, more IP and port combinations are exchanged in order to inform peers of other communication possibilities. In order to identify packets as belonging to the same connection, endpoints choose a connection key and exchange it with their peer. New IP and port combinations are exchanged as associated with a connection key, then.
However, one of the interesting aspects here is that MPTCP does not send these keys in every packet. They – or rather, tokens derived by hashing the key – are sent when explicitly establishing or breaking the association of a new flow with the original connection. As such, neither these keys nor their tokens are used for routing purposes.
It is worth pointing out, though, that server-side ports subsequently associated with a connection do not necessarily have port numbers fixed due to their association with any particular server protocol. They can be chosen by the server as ephemeral ports as well, as long as the server is listening on them.
All this, of course, as relates to existing technologies, some of which we intend to build on, and some of which we may use only for inspiration. However, with all these details recalled, we can now speak better about the purposes of all the packet header information. None of this is particularly new – it’s all in the OSI model – but it illustrates how different protocols fulfil their role within the model.
- At the link layer, Ethernet is concerned with addressing networking hardware. And if a destination is unknown, Ethernet is not really able to do anything about it.
- At the internet layer, IP is effectively trying to solve this problem of the link layer by adding routing to abstract machine addresses.
- At the transport layer, TCP and UDP are concerned with providing a connection between a service and a user agent. For discoverability purposes, the service address is fixed, and for the purpose of supporting multiple users and use cases, the user agent address is partly ephemeral.
- Somewhere between the transport and the application layer, MPTCP provides a more abstract concept of a connection between a user agent and a service that is not tied to individual IP addresses and ports, but instead to keys.
For the purpose of MPTCP, the keys used to identify a flow as part of a larger connection, well, they could be any random string really. But MPTCP also intends these keys to authenticate a new flow as being part of an existing connection. Let’s ignore the details on how that’s supposed to happen here.
It’s important to highlight a few things:
- Keys with different scopes occupy the same namespace. One key effectively identifies a service, and the other a user agent.
- The keys are, well, let’s say more static than the endpoint identifiers used in the underlying flows. While there is no guarantee that keys are actually statically assigned to user agent or service, it’s clear that on one side, ephemeral parts to the flow identifier are required, while on the other side they’re useful.
- Having at least part of the flow identifiers relatively static allows for efficient routing. In the above stack, that’s the IP address and MAC address respectively, at different levels of the OSI Model. For that, both need to be included in the packet header, though.
There’s a certain amount of conflict in the above characteristics that the Internet stack kind of solves by delegating different parts of the problem to different OSI layers. And while it’s clear we’re starting out by building on top of UDP, inheriting some of those layers, if we can somehow reconcile these conflicts in our keys, we may also have the ability to build directly on IP or indeed on Ethernet at a later stage.
The easiest part for doing this is to conceptually break the key into several parts. And really, we should not be talking about a key here. Rather, it’s an identifier for either a service or a user agent. Let’s call this a peer identifier instead of a key.
- A routing part. This part must remain fairly static for routing to work. Now we’re not overly concerned with routing at the moment, so for our purposes this could a fixed bit string, a zero-length bit string, or some random bits.
- An authentication part. This part effectively identifies the user agent or service, once routing is solved. In one way or another, it should be used for authenticating the peers – this does not mean it has to be key material, but it certainly would be sensible to have it somehow derived from key material in a predictable fashion, such that both endpoints knowing the same key can derive the same authentication part from it, and map back to the same key when they see it. An example might be (a truncation of) a public key fingerprint.
- An ephemeral part. This part can be optional, but if present, would serve to identify unique and independent instances of the same authenticated peer. At the moment, we can expect it to be a zero-length bit string, but in an implementation directly on IP or Ethernet, it would effectively serve the same role as ports do in UDP and TCP.
All of this is a tad theoretical. We really only need the authentication part in order to identify flows as belonging to the same connection. But what is important is that if we can tweak three dimensions to a peer identifier in different embodiments of the same concept, which is a multi-link user agent to service connection, then perhaps the right thing to do is to acknowledge the result of this specification considerations will be a protocol suite rather than individual protocols.
Each protocol in the suite may define a different structure for peer identifiers, and arrive at different peer identifier lengths. But they’re all the same in that a tuple of two peer identifiers sufficiently identifies a conceptual connection over multiple links, and for (future?) routing purposes, this tuple should be sent in every packet header.
There are many different ways to negotiate protocol features, and peer identifier characteristics may conceivably be the kind of thing considered a protocol feature that should be negotiated.
There’s one problem with that, however: we may need to route packets already in order to be able to negotiate anything. That is, peer identifier characteristics are the one thing that is non-negotiable between peers.
In Ethernet frames or IP packets, there is space for a field indicating which encapsulated protocol is in use. We could conceivably follow the same route, except that of course UDP does not contain such a field. The implication is that we need to provide it in our own packet header, and at a fixed offset independent of any other protocol features that may be negotiated. The first few bytes make for a good offset here.
So what kind of other protocol features could we need to negotiate? Well… maybe we don’t need any negotiation at all.
If you look at security issues plaguing the history of SSL and TLS, they all boil down to two categories:
- Either there are issues in the messages being sent that can be exploited.
- Or the negotiated features add complexity that can be exploited.
Guess which category produces more problems? A clue to this may be Qualsys' SSL Server Test. It rates the security of SSL/TLS connections. And while it definitely has strong complaints about known broken protocol versions, by far the most checks it performs are for problematic protocol features and outdated cipher suites.
I understand why the TLS protocol wants these negotiated. It allows for a slower update cycle of TLS versions, whilst leaving the actual security of server implementations somewhat in the hands of the user. And it’s been pretty clear for a long time now that many users do not know enough about these security characteristics to provide secure implementations.
By contrast, WireGuard effectively ties the crypto ciphers to the protocol version via its Construction and Identifier parameters. Both are actually folded into the peer identifier equivalent via a hash function, establishing an inseparable connection between the peer and the protocol it speaks. Any packets not conforming to those expectations are silently discarded, as if the peer did not even exist.
The downside is that changes to the cryptographic features of the protocol (sort of) lead to a new protocol version (really only the Construction changes, but the resulting hash changes as well). But on the plus side, you cannot negotiate your way into an insecure situation.
In this proposed protocol suite, we’ll follow the WireGuard model. That is, we establish one preferred set of cryptographic methods. We’ll also establish a protocol version that lays out basic messages and the handshake. And their combination will be the unique protocol identifier for this particular implementation of the protocol suite.
How precisely these are hashed into a unique identifier is not important at this point, and may change from implementation to implementation. The only thing that is required is for the entire suite to (eventually) agree on a fixed length to the result, potentially obtained by truncation. This fixed number of bits will be the first part of every packet header, followed by destination and source peer identifiers.
Remember how we were previously discussing channels?
Channels share some superficial similarity between ports, or in our more abstract model the ephemeral part of a peer identifier. But their purpose is a little different.
Where ports or the ephemeral part exist primarily for identifying different instances of the same software or authenticated peer, channels exist to further disambiguate packets within an existing connection.
The purpose of channels is to allow peers to separately communicate different concerns. In a video streaming situation, for example, it may be sensible to provide video and audio data on different channels. Our brains, for example, are surprisingly good at filling in blanks in audio due to lost packets, but find skipping images in video extremely jarring. It would be useful therefore to not only treat these different streams with different priority, but also to ensure that if packets belonging to one channel are lost, other channels are not impacted by this.
Hence, channel identifiers. We’ll leave out for the moment how channel identifiers are chosen, and new channels are opened or existing channels closed. For the moment, two things suffice:
- A channel identifier is needed in the packet header.
- A default (zero) channel identifier is needed for initial messages, such as cryptographic handshakes.
We’ll need the default identifier so that implementations can perform cryptographic handshakes without leaking any messages concerned with channel establishment. Conversely, we can also decide that this channel should largely be used for purposes that span the entire connection. This could be:
- Cryptographic handshakes
- Channel establishment
- Flow association
On the other hand, in order to keep channels virtually independent of each other, each packet must belong to one channel and one channel alone. On the other hand, the channel characteristics can be negotiated independently of each other (but this is after cryptographic handshakes are exchanged, so not as security relevant).
That’s mostly it for the packet header.
There’s another, as-yet undefined data point: the ubiquitous reserved bits.
Technically speaking, with a protocol implementation identifier as the first part of the packet, reserved bits could be specified per implementation. However, there are likely some bits we can use across implementations, it’s just not yet entirely clear which. A spin bit may be a likely candidate.
Lastly, encapsulating packets into UDP datagrams does not really require any changes to this header. However, if we want to use stream-oriented connections as fallbacks to UDP, we’ll need some way to differentiate one packet from another. Here, only a packet length specifier will suffice.
With that, we’re arriving at four mandatory and two optional packet header fields.
In any specific protocol implementation, this should be fixed-length data, allowing for hardcoding offsets into packet buffers and consequent fast decision making paths in the code.
- A protocol implementation identifier.
- A destination peer identifier.
- A source peer identifier.
- A channel identifier.
- An optional set of reserved bits.
- An optional payload length.
I’ve hinted at it throughout this text, but it’s worth repeating explicitly: I expect the communications of this protocol suite to be encapsulated in messages sent across one or more of these channels. Messages will be prefixed by a message identifier, and the message identifier solely determines how the rest of the message is to be interpreted.
That means we’ll effectively specify the rest of the protocol suite as mandatory or optional sub-protocols, each defining their own more limited message set.
There are some processing disadvantages to this, especially when it comes to handshakes and session initiation. But by and large, the advantages weigh more:
- The ability to treat sub-protocols in isolation, and promote versions independently.
- The ability to provide optional sub-protocols that not every protocol implementation needs to provide.
- The ability to compress multiple messages (of multiple sub-protocols) into a single packet (as long as they all belong to the same channel).
- The ability to support fixed and variable length messages, according to the needs of the protocols.
So there you have it.
We’ll have a protocol suite, with a specific packet header layout. We’ll have sub-protocols expressed in messages. And we’ll have specific protocol implementations providing specific sizes to the packet header fields, choosing specific cryptographic properties, and making a choice of sub-protocols.
In the next posts, I’ll explore individual sub-protocols. For practical reasons, I’ll leave cryptography for a later post, even though it’s at the core of some of the above considerations. But with the introduction of the abstract segmented peer identifier, we can forge ahead – a unique bit string per peer is sufficient for most purposes.