Lessons learned from implementing Raft in Rust
A little more than a week ago, while I was working on my toy OS project in one of the Recurse Center’s side rooms, a few other Recursers entered the room to discuss a shared effort toward implementing the Raft consensus algorithm. The discussion immediately caught my attention: I knew vaguely about Raft, and distributed systems are of particular interest to me. I was quickly drawn into the group and excited by the new project.
In this first meeting, we (the “Raft Implementers club”) reviewed the paper’s proof of safety in log replication. The others had read the paper prior to the meeting and talked about various points of confusion. They were planning to write implementations alternatively in Go, Julia, Haskell, and Elixir. I decided that I would have a go at implementing the algorithm as well, using Rust. That night, on the train home (and extending to the morning commute), I read the paper describing the Raft protocol and familiarized myself with the various rules.
raft.rs
After two false-starts, raft.rs now has stable cluster communication and leader election. Once I had the program compiling, I added quite a bit of structured logging so that I can have confidence in the program’s behavior. Remaining work on the core algorithm includes accepting client commands and tracking peer log indices. I also need to replace the message serialization method with Protobuf; currently I’m simply encoding/decoding via colon-delimited plaintext strings. I’d also like to move to finer grained locking: presently there is one lock protecting the server’s whole internal state. It should be possible to answer read-only requests concurrently with a read/write lock.
Besides the fun of implementing the algorithm, this was a great growth exercise
as a Rust programmer. It used to be the case that I would write a bit of code
and run cargo check
to learn about any borrowck or lifetime issues. After a
few days on this project, though, I was able to recognize and reason out such
problems while structuring the code. My first and second attempts mired in
deciding how to architect the system to satisfy the algorithm: how to represent
messages, timers, state, and object relationships. By my third attempt, having
resolved these design choices and learned to reason out ownership, I was able to
write outthe majority of the program without ever running it through the borrow
checker. That’s an immensely satisfying position!
Takeaways from Rust
Rust is a fantastic language. The standard library is a joy to use: builtin modules, traits, types, and methods are cohesive and well organized. The documentation and its rendering are excellent. The ecosystem is large enough now to provide multiple solutions for common functionality like argument parsing, logging, working with dates and time, and object (de)serialization.
Here are a few things I’ve learned so far by working on this project:
-
Writing a macro isn’t nearly as hard as I thought, and they’re even more useful than I imagined. I used macros to reduce boilerplate and as a convenience for building structures. I was pleasantly surprised to find a similar pattern in use in the standard library.
-
A
'static
lifetime parameter doesn’t mean what I thought: I thought it required that the value must exist for the entire program, from initialization to termination. In fact, it only specifies that the value is allowed to live that long (reference). Admittedly, this is still a bit fuzzy to me, especially since the definition given in the book seems to agree with my original understanding. It may be that my confusion arises because lifetimes can be considered from the perspective of the value (which has a particular lifetime) or from the perspective of the item acting on the value (which describes a maximum lifetime bound). -
Working with dynamically typed data structures is a bit of a pain. Raft has three message types, each of which consists of a request/response part. I wanted my functions to be generic across these, and getting that right took a bit of trial and error. In my first attempt, I wrote parametric functions over an empty trait:
fn foo<T: RaftMsg>(...) { ... }
. Unfortunately, it isn’t possible to usematch
patterns on such a type due to monomophization. In fact, the more general problem of usingenum
variants as types is a long-standing feature request and is properly called refinement types. For now, I’m using separateenum
s for each of the message type, part, and data. The latterenum
wraps thestructs
themselves so that I can use pattern matching to destructure the message and access the data. -
By implementing a periodic
Timer
, I feel like I finally have an understanding of how condition variables work. Really, they don’t appear to be anything more than a useful interface defined over aMutex
. -
Locking plays nice with Rust’s borrowck guarantees. A handful of various types available in
std::sync
provide aget_mut
method that ensures read/write and thread-safety statically. This means that using this method to acquire theRwLock
protecting my server state is effectively a no-op: the compiler’s guarantees are just as strong as the lock’s. That’s awesome. -
Properly done, a type system is absolutely amazing. When I dabbled in Haskell a few weeks back, I was introduced to the functor, applicative, monad hierarchy at an abstract level. Rust has solidified these concepts for me through use of the
Option
andResult
types, along with the variousIterator
adaptors. These are powerful, expressive constructs that Rust uses pervasively.
The more I use Rust, the more I enjoy it. Implementing Raft has been a very rewarding exercise, touching on issues of system design, protocol serialization, network communication, locking, and concurrency. There’s been some debate about whether or not it actually describes a distributed system (versus replicating just a single state), but in any case it’s been a fun project.