Testing ESP-NOW in CI with real hardware
March 2026, Armin
Last week I needed a CI test for a feature that uses two ESP32-S3 boards talking over ESP-NOW. Not Wi-Fi, ESP-NOW. Peer-to-peer, connectionless, sub-millisecond latency, no access point. You reach for it when you need fast, low-overhead comms between ESPs: remote controls, sensor meshes, swarm robots.
The problem hit me immediately: you can't simulate two radios in CI. You can unit-test packet parsing. You can mock the ESP-NOW API. But none of those tests tell you whether two boards find each other on the air, whether packets arrive within your latency budget, whether RSSI values make sense, or whether your sequence numbering survives real RF. For that you need two physical boards, both programmed, both running, both within radio range.
So I built it. Two ESP32-S3s, one sender, one receiver, flashed and tested in a pytest run against real hardware. This post is the walkthrough. Full source is on GitHub.

The setup
Two ESP-IDF projects. Sender broadcasts a 4-byte packet once per second with an incrementing sequence number. Receiver listens, tracks what arrives, reports RSSI and packet loss stats. Both print structured output to serial so the test harness can parse it.
┌─────────────────────┐ ESP-NOW ┌─────────────────────┐
│ ESP32-S3 #1 │ ─ ─ ─ ─ broadcast ─ ─ ─ ─ ▶│ ESP32-S3 #2 │
│ sender │ 2.4 GHz, ~1ms │ receiver │
│ │ │ │
│ TX: seq=1 │ │ RX: seq=1 rssi=-19 │
│ TX: seq=2 │ │ RX: seq=2 rssi=-20 │
│ TX: seq=3 │ │ RX: seq=3 rssi=-19 │
└─────────────────────┘ └─────────────────────┘The boards sit on a SiliconRig pod, a pod master with ESP32-S3 dev boards connected over USB. I flash firmware and read serial remotely through SiliconRig's Python SDK. The boards are a few centimeters apart, so RF conditions are predictable: RSSI around -15 to -30 dBm, zero packet loss.
Sender firmware
The sender is as simple as ESP-NOW gets. Init Wi-Fi in STA mode (ESP-NOW requires it even though we never connect to an AP), add a broadcast peer, loop forever.
#include <stdio.h>
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_wifi.h"
#include "esp_now.h"
#include "esp_mac.h"
#include "nvs_flash.h"
static const uint8_t broadcast_addr[ESP_NOW_ETH_ALEN] =
{0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
typedef struct {
uint32_t seq;
} __attribute__((packed)) packet_t;
static void wifi_init(void)
{
ESP_ERROR_CHECK(esp_netif_init());
ESP_ERROR_CHECK(esp_event_loop_create_default());
wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
ESP_ERROR_CHECK(esp_wifi_init(&cfg));
ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_STA));
ESP_ERROR_CHECK(esp_wifi_start());
ESP_ERROR_CHECK(esp_wifi_set_max_tx_power(40)); /* 10 dBm */
}
void app_main(void)
{
ESP_ERROR_CHECK(nvs_flash_init());
wifi_init();
ESP_ERROR_CHECK(esp_now_init());
esp_now_peer_info_t peer = {
.channel = 0,
.ifidx = WIFI_IF_STA,
.encrypt = false,
};
memcpy(peer.peer_addr, broadcast_addr, ESP_NOW_ETH_ALEN);
ESP_ERROR_CHECK(esp_now_add_peer(&peer));
printf("SENDER: ready\n");
packet_t pkt = { .seq = 0 };
while (1) {
pkt.seq++;
esp_err_t err = esp_now_send(
broadcast_addr, (const uint8_t *)&pkt, sizeof(pkt));
if (err == ESP_OK)
printf("TX: seq=%lu\n", (unsigned long)pkt.seq);
vTaskDelay(pdMS_TO_TICKS(1000));
}
}One line matters here: esp_wifi_set_max_tx_power(40). That caps TX at 10 dBm. Without it, two boards sharing a USB hub and hitting full Wi-Fi TX simultaneously draw more current than the USB controller can supply. One board brownout-resets in a loop, the other keeps running. The test just times out with no useful error. 10 dBm is plenty when the boards sit next to each other.
The broadcast address (FF:FF:FF:FF:FF:FF) means any ESP-NOW receiver on the same channel picks it up. No MAC address exchange needed. The __attribute__((packed)) on the packet struct is overkill for a single uint32_t, but it's a habit. You don't want padding surprises when you add fields later.
Receiver firmware
The receiver is more interesting. ESP-NOW delivers packets via a callback that runs on the Wi-Fi task. The callback context limits what you can safely do.
My first version had printf directly in the callback. Worked fine locally. Then I ran it through the remote serial proxy and got this:
RX: seq=9 rssi=-20
SUMMARY: received=10 mRX: seq=11 rssi=-19
issed=0 total=10 loss=0.0%The SUMMARY line got shredded. The UART TX FIFO on the ESP32-S3 is 128 bytes, and when printf waits for it to drain, FreeRTOS yields back to the Wi-Fi task, which calls the callback again and starts another printf.
The fix is a FreeRTOS queue. The callback stuffs a struct into the queue, a separate task drains it and prints:
typedef struct {
uint32_t seq;
int rssi;
uint32_t rx_count;
uint32_t missed;
} print_msg_t;
static uint32_t rx_count = 0;
static uint32_t last_seq = 0;
static uint32_t missed = 0;
static QueueHandle_t print_queue;
static void on_recv(const esp_now_recv_info_t *info,
const uint8_t *data, int len)
{
if (len != sizeof(packet_t)) return;
packet_t pkt;
memcpy(&pkt, data, sizeof(pkt));
rx_count++;
if (last_seq > 0 && pkt.seq > last_seq + 1)
missed += (pkt.seq - last_seq - 1);
last_seq = pkt.seq;
print_msg_t msg = {
.seq = pkt.seq,
.rssi = info->rx_ctrl->rssi,
.rx_count = rx_count,
.missed = missed,
};
xQueueSend(print_queue, &msg, 0);
}
static void print_task(void *arg)
{
print_msg_t msg;
while (1) {
if (xQueueReceive(print_queue, &msg, portMAX_DELAY)) {
printf("RX: seq=%lu rssi=%d\n",
(unsigned long)msg.seq, msg.rssi);
if (msg.rx_count % 10 == 0) {
uint32_t total = msg.rx_count + msg.missed;
printf("SUMMARY: received=%lu missed=%lu "
"total=%lu loss=%.1f%%\n",
(unsigned long)msg.rx_count,
(unsigned long)msg.missed,
(unsigned long)total,
total > 0
? (msg.missed * 100.0 / total)
: 0.0);
}
}
}
}All serial I/O now happens from one task. No more interleaving.
Packet loss tracking is straightforward: if we receive seq=5 after seq=2, we missed 3 and 4. Doesn't catch duplicates or reordering, but for a broadcast link quality check it's enough.
The pytest test
The test uses SiliconRig's Python SDK to grab two physical ESP32-S3 boards, flash them, and read their serial output, all from pytest fixtures.
import re
from pathlib import Path
import pytest
from siliconrig import Board
_DIR = Path(__file__).resolve().parent.parent
SENDER_FW = str(_DIR / "sender" / "build" / "sender.bin")
RECEIVER_FW = str(_DIR / "receiver" / "build" / "receiver.bin")
@pytest.fixture(scope="module")
def sender():
with Board("esp32-s3", firmware=SENDER_FW) as b:
b.expect("TX: seq=", timeout=60)
b.flush()
yield b
@pytest.fixture(scope="module")
def receiver():
with Board("esp32-s3", firmware=RECEIVER_FW) as b:
b.expect("RX: seq=", timeout=60)
b.flush()
yield bscope="module" keeps the boards flashed and running across all tests. expect() blocks until the pattern shows up in serial. That's how we wait for boot. flush() tosses buffered boot noise so each test starts clean.
One detail worth noting: the sender fixture waits for "TX: seq=", not "SENDER: ready". During flash, esptool owns the serial port. By the time the serial proxy reconnects, the board has already printed "ready" and moved on. That message is gone. But the board keeps transmitting every second, so waiting for a TX line works reliably.
The tests:
def test_sender_transmits(sender):
"""Sender should produce TX lines."""
output = sender.read_until("TX: seq=", timeout=10)
assert "TX: seq=" in output
def test_receiver_gets_packets(sender, receiver):
"""Receiver should receive packets over ESP-NOW."""
output = receiver.read_until("RX: seq=", timeout=15)
assert "RX: seq=" in output
def test_rssi_in_range(sender, receiver):
"""RSSI should be between -100 and 0 dBm."""
output = receiver.read(timeout=5)
match = re.search(r"rssi=(-?\d+)", output)
assert match, f"no RSSI value found: {output!r}"
rssi = int(match.group(1))
assert -100 <= rssi <= 0, f"RSSI {rssi} out of range"
def test_no_excessive_packet_loss(sender, receiver):
"""Loss rate should be below 50%."""
output = receiver.read_until("SUMMARY:", timeout=30)
output += receiver.read_until("SUMMARY:", timeout=30)
match = re.search(r"loss=([\d.]+)%", output)
assert match, f"no loss value found: {output!r}"
loss = float(match.group(1))
assert loss < 50.0, f"packet loss {loss}% exceeds threshold"Four tests. Does the sender transmit? Do packets cross the air gap? Is RSSI sane? Is packet loss acceptable? The 50% threshold sounds generous, but with the boards centimeters apart, actual loss is 0%. The threshold catches "something is fundamentally broken", not tight RF specs.
Fixture ordering matters: test_sender_transmits runs first and creates the sender fixture. By the time test_receiver_gets_packets creates the receiver, the sender is already broadcasting, so the receiver picks up packets immediately after boot.
Running it
Locally:
pip install siliconrig pytest
export SRIG_API_KEY=key_...
cd esp-now-demo
pytest test/test_esp_now.py -vtest/test_esp_now.py::test_sender_transmits PASSED [ 25%]
test/test_esp_now.py::test_receiver_gets_packets PASSED [ 50%]
test/test_esp_now.py::test_rssi_in_range PASSED [ 75%]
test/test_esp_now.py::test_no_excessive_packet_loss PASSED [100%]
========================= 4 passed in 74.69s =========================75 seconds. Most of that is flashing, about 25 seconds per board over the remote USB link. The tests themselves run in under 30 seconds.
In CI, the GitHub Actions workflow is short:
name: ESP-NOW HIL Test
on:
push:
branches: [main]
jobs:
hil-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install dependencies
run: pip install siliconrig pytest
- name: Run HIL tests
env:
SRIG_API_KEY: ${{ secrets.SRIG_API_KEY }}
run: pytest test/test_esp_now.py -vNo self-hosted runner. No hardware in a closet. ubuntu-latest calls into SiliconRig's API, which allocates two boards on a remote pod, flashes them, streams serial back over WebSocket, and releases them when done. Real silicon on every push.
Debugging notes
Brownout resets. Two boards boot fine on their own. Start transmitting at the same time, one resets. No error in the serial output because the board resets before it can print anything. A USB current meter on the hub showed the draw spiking past what the port could supply. esp_wifi_set_max_tx_power(40) fixed it.
Printf from callback. The data was all there, just garbled. A mutex didn't help, because the problem isn't concurrent access to printf. It's that printf yields partway through a long format string when the FIFO fills up, and then the callback fires again on the newly-scheduled Wi-Fi task. The queue pattern is the right fix.
Serial-after-flash timing. My first test fixture waited for "SENDER: ready". Worked 1 in 5 runs. The other 4 times it timed out because the ready message was printed during the ~200ms between esptool releasing the serial port and the WebSocket reconnecting. Design your firmware to output periodic status, not one-shot boot messages. This applies to any remote serial setup, not just SiliconRig.
What this doesn't cover
This setup has clear limits.
The boards are centimeters apart. That tells you nothing about range. The RF environment on a pod is quiet and predictable: no microwaves, no Bluetooth earbuds, no neighbor's Wi-Fi blasting on the same channel. The serial proxy adds 5-10 ms of jitter, so if your protocol has sub-millisecond timing requirements you need on-board instrumentation, not remote serial parsing. The boards are USB-powered, so battery behavior is out of scope.
What this is good for: regression testing. Verifying that yesterday's working RF link still works after today's firmware change. A smoke test for your radio stack, run on every commit. That alone is more than most embedded teams have.
Full source (firmware, test, CI config) is at github.com/siliconrig/examples/tree/main/esp-now-demo. If you run into issues or want to talk about your HIL setup, drop me a line at feedback@siliconrig.dev.