Jake Robert Read

Log Machine Systems Stray Projects About RSS

MUDLink: a Modular UART Duplex Link

MUDLink on GitHub / Arduino

I built a new library for Arduino that lets us promote any Serial port (UART) into a competent link / transport layer with guaranteed delivery of framed packets.

MUDLink adds packet framing, delivery guarantees, and flow control to just about any UART Port. It also records stats on the link’s performance, to enable a litle bit of feedback systems assembly.

I have wanted to release something like this for a while since just about every microcontroller has at least one UART port, and they are well understood in the OSHW community - making them a good candidate for link layers that could help to enable a commons of interoperable hardware. In order to get from UART to Reliable Link Layer, though, we need to add a few things:

Packet Framing

UARTs send bytes only, and have no native packet delineation as is the case in i.e. I2C, CANBus etc: this makes it difficult to frame and decode efficient bytecodes of any real complexity.

Bit and Byte Errors

UARTs are not fault tolerant: each microcontroller simply sends bytes at whatever configured baudrate, and recieves at the same. If bytes are lost or bits are flipped, but programmers don’t anticipate or catch these errors, things can go wrong. This means that folks are left either with unreliable systems (throwing intermittent errors, the worst kind!), or are left re-building fault tolerance when they might rather be finishing the interesting parts of their project.

Flow Control and Feedback

Since UARTs are mostly used with only TX and RX lines (no CTS/RTS etc), they are not normally flow controlled. This can be troublesome in applications where i.e. one participant can write significantly faster than the other can read (leading to lost bytes).

It can also be difficult to tell what an appropriate baudrate is: lower rates are typically less prone to errors, but that stability comes at an obvious performance loss. In some cases, we pick what seems like a stable baudrate only to change our environment slightly, and find that it is actually quite error prone… for example we might turn on a high-power motor (injecting a lot of noise) and have our comms break down.

These kinds of phenomena typically means that (as is the case in much systems assembly), we end up using lower performance parameters than we might be able to, i.e. opting for baudrates that never fail, even though they may be up to one quarter the speed fo baudrates that fail only very occasionally.

Enter MUDLink, a Useful UART Link

So, the MUDLink is a software bolt-on that effectively promotes any Arduino-exposed Serial object from a PHY (layer one of the OSI model) into a simple Transport layer (although it roughly eschews Layer 3: since it only sends datagrams between two devices, there is no addressing or routing… it also misses segmentation which is meant to live in Layer 4).

Longer discussion on any resemblance to the OSI model is mostly useless anyways, as many have stated (most recently, Nathan made this point in his Machine Class lecture) it is a good common understanding or theory of how we might delayer networks, but is rarely followed carefully in practice.

In simple terms, MUDLink lets you quickly establish a packetized, reliable and inspectable connection between two microcontrollers.

Packet Framing with COBS

Firstly, we need to make streams of bytes into chunked packets. This is like networking 101: if we don’t know where any given byte goes w/r/t any other byte, we can’t reliably do much (unless each of our messages is just one byte long).

From the Wiki Page on COBS: COBS transforms an arbitrary string of bytes in the range [0,255] into bytes in the range [1,255]. Having eliminated all zero bytes from the data, a zero byte can now be used to unambiguously mark the end of the transformed data.

COBS is Consistent Overhead Byte Stuffing and is already in most embedded developers’ toolbox. It has the excellent property that it requires only two extra bytes of transmitted data for any packet smaller than 254 bytes (and only one more byte per chunk of 255 thereafter) - and requires those bytes consistently, meaning that we can have more deterministic systems.

COBS is also simple and fast: about 64 lines of c code (with comments). MUDLink deploys the wikipedia page’s implementation directly (unfortunately I wouldn’t be able to tell you who wrote that, though - thanks to them!).

Error Catching with CRC16-CCITT

Once we have rx’d a sequence of bytes, we need to check that they are valid, since things may have gone awry while they were en route (flipped bits due to noise or missed bytes due to slow interrupts being the most common culprits).

For this, MUDLink uses a Cyclic Redundancy Check, particularely the CRC16-CCITT polynomial. For a really well done explainer on CRCs, check out Ben Eater’s video on the topic.

A CRC is like a really fancy checksum: we use a carefully selected function that takes our packet (as a big number) and generates a smaller number that is guaranteed to change if some changes to the packet occur - for example the CRC16-CCITT sis guaranteed to change if there is a change to a single bit within a packet of 8000 bytes - so, pretty good. It’s kindof like a hashing function but different in important ways. Real big brain math stuff going on in there. I would like to point out that I understand the topic only glancingly, and you should refer to others for more detailed and correct explainers (i.e. Ben Eater above).

CRCs are additionally designed to be quickly implemented: in theory they are basically big long divisions but in practice they are calculated using bitwise XOR operations and bit-shifts.

Flow Control and Delivery Guarantees

So! We can catch packets and make sure they are legit. Now we want the real magic: we want to know from our end (1) that our packet was successfully received and (2) that it has been consumed by our partner, so that they are clear and free to recieve the next message. We accomplish these goals together with an ack-and-retransmit scheme not dissimilar from TCP itself, which is again explained really well by Ben Eater.

The MUDLink code is essentially four tiny state machines that signal one another, which I’ve tried to represent here. The left side is application facing states and actions: the programmer can tell if the port is open (not in a failed state), and if it is clear to send or clear to read a packet. On the right side we have, basically, hardware buffers. These send data out into the world and are responsible for catching it. Either side can generate and consume events from the others: for example when we finish RX’ing a packet that passes the CRC, it sets the user-facing CTR state. When they finally read the packet, the CTR state transitions to EMPTY, and an ACK is generated and written into the TX buffer. Keeping a clear separation of these layers normally helps us separate ISR’s from user code, but in this case it just helps clarify the system operation.

TCP is a lot more complex than this (MUDLink is only 350 lines of code as of this post), but they use similar principles:

  • We keep a rolling sequence number that increments when we generate a new message.
    • We always transmit our most recent segment number, as well as the sequence number of the packet we have most recently read from our partner (as an ack).
  • When we transmit a message, we wait until we have recevied an ack from our partner before we transmit any new data.
    • We may, however, retransmit the same message if our timeout for that runs out.
    • After each timeout, we increase the timeout interval exponentially.
    • When we have retried the packet a number of times, we enter a failure state.
  • When we recieve a new packet, we present the data to the application by setting Clear To Read (CTR) high. We generate an ack for our partner only when the user has consumed that data.
  • When we recieve a re-transmitted packet (that we have already successfully caught and read) we generate a new ACK, because the sender likely missed the previous one (hence their attempt to retransmit).
  • Finally, we run a keepalive timer: if we haven’t transmitted anything in some interval, we ping our partner to make sure they know we’re still open. Similarely, if we don’t hear anything for a while, we consider the port to be closed.

Statistics Tracking

So, I had mentioned that it’s often difficult to measure what’s going on in our networks, which makes it difficult to design systems. It turns out this is a bona-fide problem in industry, and especially in safety critical systems: great engineering bookeeping efforts are expended making sure that there will always be enough network bandwidth available in, for example, a 747’s avionics system.

In OSHW, which currently builds more ad-hoc projects, we rarely take on that level of bookkeeping, so we are left in the relative dark about how much bandwidth we are using in our networks, how fast they are really running, etc.

MUDLink adds a little to ameliorate this in that it tracks some stats, we have these (self describing) available:

typedef struct MUDLStats {
  uint32_t rxSuccessCount = 0;
  uint32_t rxFailureCount = 0;
  uint32_t txSuccessCount = 0; 
  uint32_t txFailureCount = 0; 
  uint32_t txTotalRetries = 0;
  // the longest timeout issued since startup 
  uint32_t outgoingTimeoutLengthHighWaterMark = 0; 
  // maybe the most important, average message trip time, 
  float averageTotalTransmitTime = 0.0F;
  // comparable to the wire time (bits / baudrate) for most msgs 
  float averageWireTime = 0.0F;
  // and the num of tx retries normally required 
  float averageRetryCount = 0.0F; 
} MUDLStats;

These are available to the programmer, to assess their choice of baudrate, quality of underlying link, etc.

API and Protocol

MUDLink ingests any Serial instance within the Arduino ecosystem, meaning it can be deployed in just about any Arduino project. We instantiate it using a a template like this:

// 2nd argument is the baudrate 
MUDL_Link<decltype(Serial1)> mudl(&Serial1, 2000000);

It has a pretty simple API, we can check if the link is clear to send or clear to read, and we can send and read packets.


uint8_t appRx[255];
uint8_t appTx[255]; 

void loop(void){
  // MUDLink runs on an event loop, 
  mudl.loop();

  // check-then-read,
  if(mudl.clearToRead()){
    mudl.read(appRx, 255);
  }
  
  // presuming we have stuffd appTx with a message: 
  if(mudl.clearToSend()){
    mudl.send(appTx, msgLen);
  }

  // and we can get those stats, 
  MUDLStats stats = mudl.getStats(); 
}

The message spec is pretty simple as well, beneath our COBS encoded packets we have this block:

Bytes 0 … N-5 Byte Len-4 Byte Len-3 Byte Len-2 Byte Len-1
… Payload ACKD_SEQ TXD_SEQ CRC:1 CRC:2

So, pretty small overhead, and the payload is simply missing when we send ack-only or keepalive messages.

System Limits, Future Work, and What I Learned While I Did It

MUDL has a pretty clear limit to performance, which is that it has an overhead floor of 2x: we will always use roughly half (or less) of the available bandwidth in a duplex link, because we are always waiting for our partner’s ack before we can send another. So, on a scope (with TX and RX lines tapped) we see this kind of pattern:

scope

This is ameliorated in grown-up protocols with transmission windows - each announces how long their receive buffer is, and then we can transmit as many messages as are in the buffer before receiving an ack. TCP does this, and hopefully MUDL will in the future… I have written that protocol out before, but it was a little shaky. Then again, it didn’t have CRC at that time, so I should bring my new learning back to that project.

MUDL is also limited by the underlying performance of whichever block of Arduino Core code is running the Serial implementation. In this case I was using Earle the III’d RP2040 Core, which is wonderful (thanks Earle!), but which doesn’t seem to service the actual transmit interrupt fast enough. So, if we send 1MBaud of 0b01010101 we see this:

1meg

But if we config at 3MBaud:

3meg

So - what’s up? Well, we are transmitting at a speedy baudrate, but every time the RP2040’s TX FIFO dries up, it takes too long to refill (hence the inter-byte gap). Since a 3MBaud Byte (10 bits for UART framing) is only 3.3 microseconds long (at 200MHz, 660 clock cycles), we can run out of ISR time if we don’t write tight enough code. In practice, this means that while the RP2040 can run a UART up to 3MBaud, we only ever effectively see about 1MBaud of real performance.

This kind of thing is why the pros use DMA, etc, and why I am thinking I aught to return to that practice. With DMA, our app writes a big TX buffer, and then each byte is ferried into the UART without the processor’s intervention at all - normally triggering an interrupt when the transmit is totally complete.

High-speed hardo embedded code trades off with portability though, of course. For now, I like the flexible, code-only transport / link layer as it is, but I can look forward to writing custom implementations for some of my favourite microcontrollers. One of the nice things of this scheme is that we can have a sloppy implementation still pair up with a high-speed implementation, and everything should still work.

Odds and Ends

One other thing I learned during this cycle is that the Arduino I2C library is blocking - so when we use these OLED displays (see the first picture here), which require a big chunk of data in the screenbuffer, we can get big blocking calls in our Arduino code. In the pictured demo, the display write is over 20ms.

This could also be ameliorated by some hardo embedded code, but it’s actually been a good foil: MUDLink works despite the blocking call that hampers performance servicing the transmit routine and also causes packets to be entirely missed (when the blocking call happens during a packet receive and we can’t run our loop, we miss bytes!).

What about Busses !

One last note is that I’ve come up with a pinout that suits this protocol (over RS485) as well as a bus topology. I would love in future times to build the protocol to service point-to-point as well as bus topologies, perhaps dynamically switching between the two.

Last trick to add would be dynamic baudrate scaling, to make these ports truly plug and play across devices and without any config.