Vessel + Wyrd - a DAG-based CRDT

Yesterday, we explored the differences between a Merkle DAG and vessel’s DAG. Today’s topic revolves around how combining wyrd’s conflict-free, replicated data type (CRDT) with vessel makes a specific kind of CRDT, namely a DAG-based one.

Fractal Structures

Figure: “Fractal Structures” by SolomonVipe is licensed under CC BY-SA 2.0

Conflict-Free, Replicated Data Types

A quick recap on CRDTs first.

They’re data types, largely fairly simple ones, such as counters or sets. And they’re conflict-free, meaning that multiple parties can modify them concurrently, and when their respective modifications get synchronized, each party with the full set of modifications will reconstruct the same state.

This permits for both offline- or local-first operations that become eventually consistent, and is powerful for all kinds of scenarios in which computers may not have connectivity to the Internet at all times – applications range from mobile computers such as phones or drones, all the way to remote areas where connectivity is highly intermittent.

There are, roughly speaking, two types of CRDTs:

  1. State-based, or convergent CRDTs, and
  2. Operation-based, or commutative CRDTs.

In practice, you can derive one type of CRDT from the other; they’re not mutually exclusive.

Convergent CRDTs

Convergent CRDTs rely on a merge function, that, given two replicas of the data type’s state, can produce a new state. Such a merge function has to be commutative, associative and idempotent. Effectively, merging states A, B and C in any order must always produce the same result – i.e. merge(A, merge(B, C)) and merge(merge(C, A), B) must both produce the same result D.

The downside of convergent CRDTs is that they must always synchronize the complete states (A, B and C), which depending on the data they hold can be quite large. In practice, that means most CRDTs are commutative.

Commutative CRDTs

Commutative CRDTs instead drop the notion of the merge function, and instead define a set of operations that can be performed on the data type – for a counter, for example, that might be an increment operation. These operations have to be commutative and associative, but no longer idempotent.

The difficulty here is how you ensure that all operations remain commutative and associative, also between each other. That is most easily demonstrated by a set, which defines an add and a remove operation.

Adding a value A to the set, and then adding a value B to the set, or doing this in reverse order, both yields the same set. Addition is commutative and associative.

Removing a value A from this set, and removing B, again in any order, both yields the same result. Removal is also commutative and associative.

But what if we mix the operations?

Starting from an empty set, let’s first add A, then remove it. The result is an empty set. On the other hand, if we try to remove A from an empty starting set, the result is still an empty set. If we subsequently add A, so reverse the two operations, we get a different result – even though each operation is in itself commutative and associative.

One of the simpler ways of solving this dilemma is to define an order to the operations.

If we have a single replica, that is simple enough: operations will simply be applied in the order in which they were issued and recorded. But when synchronizing operations between replicas, we have to find an order to operations that were performed in parallel!

Vector Clocks

Enter vector clocks.

It’s best to avoid relying on time synchronization between replicas, which leaves only logical clocks. Every replica increments their logical clock for every event (either an operation, or a synchronization). So far, so good.

Then all replicas store their clock at a fixed index in a vector, and each replica gets its own index. The vector is synchronized with the data. Recipients update each element of the vector by taking the maximum of the local and received value.

Finally, a vector clock is considered less than another, if at least one of its elements is less than the other’s (and all other elements are less than or equal). We have established logical ordering.

There’s a catch. Isn’t there always?

Vector clocks require one element per replica, which either limits the number of replicas you can support – or the clock becomes either very large, or very complicated, and at worst both.

Merkle CRDTs

As an alternative to vector clock, there exist Merkle CRDTs.

These CRDTs rely on the properties of a Merkle DAG to order operations. Specifically, they define a so-called Merkle clock as a logical clock in which every node represents an event. The Merkle tree then produces a specific order of events. Specifically, a new event creates a new root, which takes the previous root(s) as a child. In this way, every node is a later event than any of its descendants.

Merkle clocks are themselves a kind of state-based CRDT; the merge operation describes how two Merkle trees are combined:

  1. If T1 is equal to T2, they’re the same DAG and no merge is required.
  2. If T1 is included in T2, we keep T2.
  3. If T2 is included in T1, we keep T1.
  4. If neither is the case, we keep both roots T1 and T2. A new event then creates a new root with T1 and T2 as children.

One of the interesting aspects here is that there is that last step. It makes perfect sense to merge two trees by creating a new root with those two as children. But there is no causal relationship between the two roots, so the order in which they are added as children is arbitrary. It must, however, be deterministic for multiple replicas to produce the same merged root. The simplest way to achieve this is by bitwise ordering of the root hashes, but really any deterministic method is sufficient.

But let’s play this out a little.

Suppose you start with an empty set. As the first operation, you add A, which is an event that creates a new root T1.

You synchronize the set and root.

Now one replica adds A again, creating a new root T2. The other replica removes A, creating a new root T3.

You synchronize again.

If the ordering of T2 and T3 is such that the removal logically occurs before the addition, your result is a set with the element A. If the ordering is such that the addition logically occurs before the removal, your result is an empty set.

What you can say is that both replicas will have the same result.

It should be noted that using vector clocks does not provide immunity from this effect. The point of logical clocks is to disambiguate and provide consistency, but they can still surprise the user.

Vessel’s DAG as a Logical Clock

In much the same way as Merkle clocks, vessel’s DAG provides a logical order, not so much of events but of container extents. Its properties are different, however.

First, one does not receive a new root for a new event, and so the way Merkle clocks are compared and merged does not work. But the DAG brings its own algorithm for merging, and for selecting how to add new events/extents! And that’s really all that is required here.

Second, extents are relatively large – compared to the data to encode a single CRDT operation, at any rate. So we can squash a number of operations into a single extent. We now have a two-tier logical ordering:

  1. In the first tier, the order of extents from the vessel DAG applies, and
  2. within each extent, the order of operations is the order in which they appear in the extent payload.

And this is precisely how wyrd uses vessel. In the next post, we’ll explore what that means in practice.


Published on January 25, 2024