Communication Breakdown

Following my last post in which I worked through various definitions of reliability in networking, it’s time in this post to look at the main protocol contenders, and examine how they measure up to those definitions.

Let’s recap quickly with a list. Reliability can mean…

Soft delivery guarantees.
Hard delivery guarantees.
Hard delivery guarantees along the entire path. (This wasn’t mentioned last time, but we’ll get to it.)
Strict ordering of packets. (This was mentioned implicitly as stream- vs datagram oriented approaches.)
A preference for local decision making. This one is very much on a gradient, rather than an exclusive either/or feature.
Time-sensitivity. (Here, too, we can distinguish between harder and softer criteria. This is also something we’ll get to.)
Tamper-proofing
Privacy preservation
Non-Interference of independent communication links
Failover & Bonding, or multi-pathing.

Ethernet, etc.

Let’s get this out of the way first: we’re not really concerned with protocols that just link two machines together, whether that’s Ethernet, a wireless protocol, or a wired peripheral connection like USB.

It’s not that these things do not matter, on the contrary. The whole idea of an Internet relies on machines talking directly to one another, but also relaying data for machines that have no direct connection with each other. We’re interested in the network layer above these data link protocols, that establish the relaying.

However, Ethernet is worth mentioning as it is used in e.g. automotive settings with extensions to provide the exact hard time-sensitivity required in these scenarios. Also, because upper layers must rely on the lower layers for hard time-sensitivity, any relay protocol concerned with time-sensitivity must necessarily be aware of the lower layer’s time-sensitivity features along the entire path. This is why none of the upper layers actually implement this.

Internet Protocol (IP)

Our level of interest really starts at networking protocols. The venerable Internet Protocol in either the current version 6, or the older version 4 largely add abstract addresses to the data link protocols, that allow them to route between machines without direct physical connection between them.

While it would certainly be possible to discuss an IP replacement, realistically speaking, the entire world networks via IP. There is not much sense discussing the IP feature set when we’ll have to build on top of it.

Suffice to say that IP itself fulfils almost none of the reliability definitions above. The only thing that IP can do is bond physical interfaces, but such bonding setups are intrinsically linked to the data link interfaces, e.g. one can bond two Ethernet networking cards on one machine, and replicate the setup on another machine, to use two physical cables as one virtual link.

At best, this improves reliability at a single hop in a network, and the IP protocol isn’t even concerned with it. It’s more of an operating system feature. Still, it’s worth bringing up for completeness’ sake.

Lastly, extensions to IP such as IPSec can provide tamper-proofing and privacy by effectively establishing a secure tunnel between machines.

We’ll skip the User Datagram Protocol (UDP) except for this sentence: all it does over IP is add source and destination ports, allowing for different addressable services at every machine. That’s an important feature, but has an impact on reliability only if you stretch possible definitions even further.

Transmission Control Protocol (TCP)

TCP is the most widely used internet protocol providing some measure of reliability. It’s reliability interpretation provides soft delivery guarantees and strict ordering of packets. It also contains some measure of non-interference of different TCP connections, whereby each machine tries to schedule all active sessions fairly. This only guarantees non-interference at each machine, however. A hop in the middle may well prioritize some streams over others, providing no overall non-interference guarantee.

While it’s not part of TCP at all, it is certainly possible to also provide tamper-proofing and privacy. Using the widespread Transport Security Layer (TLS) protocol, applications can individually add tamper-proofing and privacy. While this is not technically a TCP feature, it retains the stream-oriented characteristics of plain TCP, and can therefore for practical reasons be considered a secured version of TCP.

	TCP
Delivery Guarantee
Soft
Hard Local
Hard Path
Strict Ordering
Local Decision Making
Time-Sensitive Networking
Soft
Hard
Tamper-Proofing	()
Privacy Preserving	()
Non-Interference	()
Failover & Bonding (Multipath)	()

Stream Control Transmission Protocol (SCTP)

In many ways, the Stream Control Transmission Protocol is a response to TCP - the motivation section of RFC4960 actually refers to limitations of TCP. SCTP explicitly loosens the strict ordering requirement, making it optional. It also provides a type of failover called multi-bonding in the specs.

On the other hand, there are no explicit non-interference features to SCTP. It provides some congestion control, which can be viewed as a very mild form of this at best. Security can be provided by adding TLS as in TCP, but since the nature of TLS is stream-oriented, it will effectively re-enable the strict ordering SCTP tries to avoid in places. It would be possible to avoid that using DTLS (see below) instead.

	TCP	SCTP
Delivery Guarantee
Soft
Hard Local
Hard Path
Strict Ordering		()
Local Decision Making
Time-Sensitive Networking
Soft
Hard
Tamper-Proofing	()	()
Privacy Preserving	()	()
Non-Interference	()
Failover & Bonding (Multipath)	()

Hypertext Transfer Protocol (HTTP)

When we’re discussing general transport protocols, it shouldn’t really make any sense to mention the web’s Hypertext Transfer Protocol - except since the current generation of software engineers knows nothing but the web, and developing for the web has been simplified the most, HTTP is used everywhere. That includes cases for which it is not ideally suited, such as streaming applications. The trend therefore is to make HTTP more and more widely applicable, which complicates the specifications to no end.

It also means it’s difficult to speak about HTTP as a single spec. With regards to reliability guarantees, however, all versions of HTTP on some level behave as as TCP. HTTP/0.9 to HTTP/2 do so because they use TCP as the lower transport layer. Various versions of HTTP either can be tunnelled through TLS, or are tightly integrated with TLS, providing security features.

QUIC, aka the upcoming HTTP/3 replaces TCP with UDP, but then re-adds the reliability features of TCP, with a twist. Instead of being concerned with fairness between different sessions, it is concerned with non-interference of different channels within the same session. But as UDP is used, sessions as such do not exist on the transport layer, which means QUIC treats every application-level, abstract connection as a multi-channel session.

No version of HTTP contains failover features, though a multi-pathing extension to QUIC is in development.

	TCP	SCTP	HTTP
Delivery Guarantee
Soft
Hard Local
Hard Path
Strict Ordering		()
Local Decision Making
Time-Sensitive Networking
Soft
Hard
Tamper-Proofing	()	()	()
Privacy Preserving	()	()	()
Non-Interference	()		()
Failover & Bonding (Multipath)	()

Datagram Transport Security Layer (DTLS)

Previously, I mentioned TLS as if it were practically an extension of other protocols - and when TLS is used over IP, applications can certainly treat it as such. It’s worth briefly looking at its datagram-oriented cousin DTLS.

As a datagram-oriented protocol, it offers just about the same guarantees as (UDP/)IP, except adding security features. One interesting point is that there are attempts in the specs to make it more able to make local decisions. That is, unlike TLS which relies on the underlying protocol providing delivery guarantees, DTLS understands that such guarantees do not exist, and each node must act according to local knowledge to the best of its ability.

Unfortunately, the DTLS handshake is complex and sends large packets, requiring some short-lived stream-like capabilities within the protocol. Even more unfortunately, this handshake must be repeated when what passes for a connection in datagram protocols is sufficiently lost, which makes DTLS less suited to high packet loss scenarios than preferable. A protocol with stronger local decision making guarantees would be more robust here.

	TCP	SCTP	HTTP	DTLS
Delivery Guarantee
Soft
Hard Local
Hard Path
Strict Ordering		()
Local Decision Making				()
Time-Sensitive Networking
Soft
Hard
Tamper-Proofing	()	()	()
Privacy Preserving	()	()	()
Non-Interference	()		()
Failover & Bonding (Multipath)	()

WireGuard

With mention of IPSec and DTLS as datagram oriented protocols that add security features, it’s also worth looking at the latest contender, WireGuard. There are a lot of small differences between it and DTLS, but like DTLS, it tends to be used over UDP. It’s also specifically designed for building secure tunnels, much like IPSec.

The main difference to DTLS is that it avoids the complex handshake almost entirely. The way it does so is simple: it relies on a prior channel for establishing pre-shared keys. That makes WireGuard very robust when used as a tunnel, but unfortunately makes it very hard to use it as an application-level protocol for connecting to arbitrary other machines.

As a consequence, however, in direct comparison to DTLS it offers purely local decision making. A key is either known or not known.

	TCP	SCTP	HTTP	DTLS
Delivery Guarantee
Soft
Hard Local
Hard Path
Strict Ordering		()
Local Decision Making				()
Time-Sensitive Networking
Soft
Hard
Tamper-Proofing	()	()	()
Privacy Preserving	()	()	()
Non-Interference	()		()
Failover & Bonding (Multipath)	()

Conclusion and Outlook

Examining some protocols in detail, and others in passing, it should be apparent that there is currently no single solution at the moment satisfying most or all reliability requirements. However, almost all of the ones described here offer partial solutions, which when integrated into a single protocol would do (most of) the job.

As a high-level outline, the choices I’ve made are these:

Start with UDP as the transport. This has two advantages: on the one hand, routing equipment is already in place all over the Internet, and on the other hand, it’s an application-level transport.
Use most of WireGuard for tamper-proofing and privacy. We will need to add a key exchange handshake. Luckily, WireGuard’s security concepts are based on the NOISE protocol framework, which has provisions for such a handshake. We can learn from DTLS and WireGuard how to implement one securely. Once keys are exchanged, re-negotiating lost connections can use WireGuard’s mechanism, leading to almost local decision making, except for that initial discovery of new peers.
Use multi-pathing in much the same way that SCTP implements multi-homing.
Use channels much as in QUIC aka HTTP/3, allowing for application-level non-interference. In fact, except for key (re-)negotiations, we can make the WireGuard-derived security features apply per channel.
We can add optional packet-level delivery guarantees, and we can even make them locally hard guarantees in how we implement them at the send buffer level.
At this point, there is no stream-like strict ordering - though we’d only have to add optional sequence numbers to the wire presentation, as well as some logic on how to treat these packets.
We can add negotiation of hard delivery options along the path - though the actual guarantee relies on all nodes along the path cooperating. There are also implementation challenges to this. Since it’s negotiable, best treat it as a protocol extensions.
Lastly, while we cannot provide hard time sensitivity on an Internet composed of many hops outside of our control. Similar to delivery guarantees along the path, we can provide a protocol extension for negotiating timing requirements.

	TCP	SCTP	HTTP	DTLS	Proposed
Delivery Guarantee
Soft					()
Hard Local					()
Hard Path					()
Strict Ordering		()			()
Local Decision Making				()	()
Time-Sensitive Networking
Soft
Hard
Tamper-Proofing	()	()	()		()
Privacy Preserving	()	()	()		()
Non-Interference	()		()
Failover & Bonding (Multipath)	()

In the next posts, I will provide draft designs for the above points. I shy away from calling them full-fledged specs, because often enough plenty minor changes become necessary during implementation.

All this work will eventually be part of the channeler protocol.