2026-04-30 22:01:41 -07:00
|
|
|
|
# roto
|
|
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
Zero-allocation Rust protobuf reader and writer.
|
2026-04-30 22:01:41 -07:00
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
## Overview
|
2026-04-30 22:01:41 -07:00
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
Instead of deserializing binary protobuf data into Rust structs, roto scans a message _once_ on
|
|
|
|
|
|
construction — recording the byte offset of each field — then reads fields on demand directly from
|
|
|
|
|
|
the original bytes. No heap allocation, no data copying, no full deserialization upfront.
|
2026-04-30 22:01:41 -07:00
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
Writing works the same way: you provide a fixed buffer and a builder writes fields directly into it,
|
|
|
|
|
|
returning a slice of the bytes written.
|
2026-05-03 13:31:39 -07:00
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
## Design
|
2026-05-03 13:31:39 -07:00
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
`protoc` generates a `CodeGeneratorRequest` message; `protoc-gen-roto` (in
|
|
|
|
|
|
`src/bin/protoc-gen-roto.rs`) reads this from stdin, generates Rust source files, and writes a
|
|
|
|
|
|
`CodeGeneratorResponse` to stdout. `protoc` then writes those `.rs` files to disk. The generated
|
|
|
|
|
|
files are included directly in the crate that uses the protobuffers.
|
2026-05-03 13:31:39 -07:00
|
|
|
|
|
2026-05-04 14:40:11 -07:00
|
|
|
|
Sample usage:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
protoc -Iproto/ proto/hackers.proto --plugin=./target/debug/protoc-gen-roto --roto_out=src/
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
This will generate a file, src/hackers.rs.
|
|
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
## Generated code
|
2026-04-30 22:01:41 -07:00
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
For each protobuf message roto generates two types:
|
|
|
|
|
|
|
|
|
|
|
|
- **Reader struct** `MessageName<'a>` — borrows the original byte slice, zero-copy.
|
|
|
|
|
|
- **Builder struct** `MessageNameBuilder<'b>` — writes into a caller-provided `&mut [u8]`.
|
|
|
|
|
|
|
|
|
|
|
|
Nested message types are placed in a `pub mod message_name { ... }` module (snake_case of the
|
|
|
|
|
|
parent message name) within the same generated file.
|
|
|
|
|
|
|
|
|
|
|
|
## Sample usage
|
|
|
|
|
|
|
|
|
|
|
|
Given this proto definition:
|
|
|
|
|
|
|
|
|
|
|
|
```proto
|
2026-04-30 22:01:41 -07:00
|
|
|
|
message Hello {
|
|
|
|
|
|
string hello_world = 1;
|
|
|
|
|
|
message InnerWorld {
|
|
|
|
|
|
string thought = 1;
|
|
|
|
|
|
}
|
|
|
|
|
|
InnerWorld inner_world = 2;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
### Reading
|
2026-04-30 23:13:24 -07:00
|
|
|
|
|
|
|
|
|
|
```rust
|
2026-05-04 13:45:18 -07:00
|
|
|
|
fn parse_proto(data: &[u8]) -> roto::Result<String> {
|
|
|
|
|
|
// Scan the data once, recording field offsets
|
|
|
|
|
|
let hello = Hello::new(data)?;
|
|
|
|
|
|
|
|
|
|
|
|
// String fields return &str borrowed from the original bytes (zero-copy)
|
|
|
|
|
|
let hello_world: &str = hello.hello_world()?;
|
|
|
|
|
|
|
|
|
|
|
|
// Nested message fields return &[u8]; construct the nested reader from those bytes
|
|
|
|
|
|
let inner_bytes: &[u8] = hello.inner_world()?;
|
|
|
|
|
|
let inner_world = hello::InnerWorld::new(inner_bytes)?;
|
|
|
|
|
|
let thought: &str = inner_world.thought()?;
|
|
|
|
|
|
|
|
|
|
|
|
Ok(format!("{} is about {}", hello_world, thought))
|
|
|
|
|
|
}
|
2026-04-30 23:13:24 -07:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
Fields absent from the binary data return `Err(roto::RotoError::FieldNotFound)`.
|
|
|
|
|
|
|
|
|
|
|
|
### Writing
|
|
|
|
|
|
|
|
|
|
|
|
Nested messages must be serialized into a scratch buffer first, then embedded as raw bytes in the
|
|
|
|
|
|
outer builder.
|
|
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
|
fn build_proto(buf: &mut [u8]) -> roto::Result<&[u8]> {
|
|
|
|
|
|
// Serialize the inner message first
|
|
|
|
|
|
let mut inner_buf = [0u8; 256];
|
|
|
|
|
|
let inner_bytes = hello::InnerWorldBuilder::builder(&mut inner_buf)
|
|
|
|
|
|
.thought("some thought")?
|
|
|
|
|
|
.finish()?;
|
|
|
|
|
|
|
|
|
|
|
|
// Build the outer message, embedding the serialized inner bytes
|
|
|
|
|
|
HelloBuilder::builder(buf)
|
|
|
|
|
|
.hello_world("some world")?
|
|
|
|
|
|
.inner_world(inner_bytes)?
|
|
|
|
|
|
.finish() // returns Result<&'b mut [u8]> — the written portion of buf
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Builder methods consume `self` and return `Result<Self>`, enabling `?`-based chaining.
|
|
|
|
|
|
`finish()` returns `Result<&'b mut [u8]>` — a slice of the portion of the buffer that was written.
|
|
|
|
|
|
|
|
|
|
|
|
### Repeated fields
|
|
|
|
|
|
|
|
|
|
|
|
Repeated fields return a `RepeatedFieldIterator<'a>`. Each item yields `Result<(&[u8], WireType)>`.
|
|
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
|
let hello = Hello::new(data)?;
|
|
|
|
|
|
for item in hello.tags() {
|
|
|
|
|
|
let (value_bytes, _wire_type) = item?;
|
|
|
|
|
|
// decode value_bytes according to the expected wire type
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Runtime API
|
|
|
|
|
|
|
|
|
|
|
|
The core runtime in `src/lib.rs` provides:
|
|
|
|
|
|
|
|
|
|
|
|
- `ProtoAccessor<'a>` — scans a message's fields and reads values at recorded offsets.
|
|
|
|
|
|
- `ProtoBuilder<'a>` — writes fields into a provided `&mut [u8]` buffer.
|
|
|
|
|
|
- `FieldIterator<'a>` / `RepeatedFieldIterator<'a>` — iterators over fields and repeated fields.
|
|
|
|
|
|
- `Tag`, `WireType` — protobuf encoding primitives.
|
|
|
|
|
|
- `read_varint`, `write_varint`, `skip_value` — low-level wire-format helpers.
|
|
|
|
|
|
- `RotoError`, `Result<T>` — error type and alias.
|
|
|
|
|
|
|
|
|
|
|
|
## High-level design
|
2026-04-30 22:01:41 -07:00
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
On construction (`MessageName::new(data)`), the generated reader struct iterates the binary once
|
|
|
|
|
|
using `FieldIterator` and records the byte offset of each field's tag. Subsequent field accesses
|
|
|
|
|
|
call `ProtoAccessor::get_value_at(offset)` — no re-scanning. For repeated fields, the start and
|
|
|
|
|
|
end offsets of the field range are recorded to bound iteration efficiently.
|
2026-04-30 22:01:41 -07:00
|
|
|
|
|
2026-05-04 14:40:11 -07:00
|
|
|
|
## Benchmarks
|
|
|
|
|
|
|
|
|
|
|
|
Two benchmark suites share the same binary data files and the same four
|
|
|
|
|
|
measurement groups:
|
|
|
|
|
|
|
|
|
|
|
|
| Group | What is timed |
|
|
|
|
|
|
| --------------- | ------------------------------------------------------- |
|
|
|
|
|
|
| `shallow_parse` | Become ready to read any field (one scan / full decode) |
|
|
|
|
|
|
| `deep_parse` | Walk the full tree: Campaign → Operations → Hackers |
|
|
|
|
|
|
| `field_access` | Read individual fields on an already-parsed message |
|
|
|
|
|
|
| `iterate` | Count top-level and nested repeated fields |
|
|
|
|
|
|
|
|
|
|
|
|
### 1 — Generate the shared data files (do this once)
|
|
|
|
|
|
|
|
|
|
|
|
Data files are written to `data/bench/`.
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
cargo run --release --bin gen_bench_data -- --preset tiny
|
|
|
|
|
|
cargo run --release --bin gen_bench_data -- --preset small
|
|
|
|
|
|
cargo run --release --bin gen_bench_data -- --preset medium
|
|
|
|
|
|
cargo run --release --bin gen_bench_data -- --preset large
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
For even larger inputs use `--preset huge` (~500 MB) or set the knobs
|
|
|
|
|
|
directly:
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
# ~50 MB: 500 operations × 100 KB stolen_data each
|
|
|
|
|
|
cargo run --release --bin gen_bench_data -- --ops 500 --stolen-kb 100 --output data/bench/50mb.pb
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2 — Rust benchmark (criterion)
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
cargo bench --bench hackers_bench
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
HTML reports are written to `target/criterion/`. Run a single group:
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
cargo bench --bench hackers_bench -- shallow_parse
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 3 — C / upb benchmark
|
|
|
|
|
|
|
|
|
|
|
|
Requires protobuf ≥ 21 with `protoc-gen-upb` (ships with modern `protoc`).
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
cd upb_test
|
|
|
|
|
|
make # compiles hackers_bench from the pre-generated upb files
|
|
|
|
|
|
./hackers_bench
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
To regenerate the upb C files from `proto/hackers.proto`:
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
cd upb_test && make regen
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-05-04 14:53:49 -07:00
|
|
|
|
### 4 — Results
|
|
|
|
|
|
|
|
|
|
|
|
Measured on Linux x86-64 with the four standard presets. Rust times are
|
|
|
|
|
|
criterion medians; C/upb times are the custom runner's mean over ≥ 0.5 s.
|
|
|
|
|
|
|
|
|
|
|
|
#### `shallow_parse` — cost to become ready to read any field
|
|
|
|
|
|
|
|
|
|
|
|
| Size | Bytes | roto (ns) | upb (ns) | roto speedup |
|
|
|
|
|
|
| ------ | ----------: | --------: | -----------: | -----------: |
|
|
|
|
|
|
| tiny | 588 | 32.7 | 606.2 | **18.5×** |
|
|
|
|
|
|
| small | 20,265 | 182.9 | 22,619.2 | **123.7×** |
|
|
|
|
|
|
| medium | 2,071,053 | 16,632.0 | 5,346,977.2 | **321×** |
|
|
|
|
|
|
| large | 102,608,384 | 1,618.6 | 41,132,079.7 | **25,411×** |
|
|
|
|
|
|
|
|
|
|
|
|
> roto's cost is O(number of top-level fields): it records field offsets by
|
|
|
|
|
|
> jumping past nested blobs using their length prefixes. upb fully decodes the
|
|
|
|
|
|
> entire tree — including all nested messages and raw byte payloads — into
|
|
|
|
|
|
> arena-allocated structs.
|
|
|
|
|
|
|
|
|
|
|
|
#### `deep_parse` — parse + walk Campaign → Operations → every Hacker handle
|
|
|
|
|
|
|
|
|
|
|
|
| Size | Bytes | roto (ns) | upb (ns) | roto speedup |
|
|
|
|
|
|
| ------ | --------: | ----------: | ----------: | -----------: |
|
|
|
|
|
|
| tiny | 588 | 385.3 | 596.8 | **1.55×** |
|
|
|
|
|
|
| small | 20,265 | 13,374.0 | 22,321.6 | **1.67×** |
|
|
|
|
|
|
| medium | 2,071,053 | 1,454,400.0 | 4,227,384.3 | **2.91×** |
|
|
|
|
|
|
|
|
|
|
|
|
> roto pays one extra `::new()` scan per nesting level; upb's walk is pure
|
|
|
|
|
|
> pointer-chasing because everything was decoded upfront. roto is still
|
|
|
|
|
|
> faster overall because its per-level scans cost less than upb's full decode.
|
|
|
|
|
|
|
|
|
|
|
|
#### `field_access` — individual field reads on a pre-parsed message (`small` preset)
|
|
|
|
|
|
|
|
|
|
|
|
| Field | roto (ns) | upb (ns) | upb speedup |
|
|
|
|
|
|
| ------------------------------ | --------: | -------: | ----------: |
|
|
|
|
|
|
| `campaign::name` | 14.3 | 1.11 | **12.9×** |
|
|
|
|
|
|
| `campaign::total_bytes_stolen` | 7.1 | 1.74 | **4.1×** |
|
|
|
|
|
|
| `operation::codename` | 13.8 | 1.76 | **7.8×** |
|
|
|
|
|
|
| `operation::timestamp` | 9.7 | 1.40 | **6.9×** |
|
|
|
|
|
|
| `operation::successful` | 7.0 | 1.13 | **6.1×** |
|
|
|
|
|
|
| `hacker::handle` | 14.4 | 1.56 | **9.2×** |
|
|
|
|
|
|
| `hacker::skill_level` (f32) | 7.7 | 1.76 | **4.4×** |
|
|
|
|
|
|
| `hacker::is_elite` (bool) | 7.5 | 1.14 | **6.6×** |
|
|
|
|
|
|
| `worm::polymorphic` (bool) | 7.5 | 1.76 | **4.2×** |
|
|
|
|
|
|
| `worm::payload` (bytes) | 16.6 | 1.75 | **9.5×** |
|
|
|
|
|
|
|
|
|
|
|
|
> After parsing, upb field reads are direct struct-member lookups (~1–2 ns).
|
|
|
|
|
|
> roto re-decodes the value at its pre-recorded byte offset on every call
|
|
|
|
|
|
> (~7–17 ns). This is the one area where upb holds a clear advantage.
|
|
|
|
|
|
|
|
|
|
|
|
#### `iterate` — count repeated fields (parse included in every iteration)
|
|
|
|
|
|
|
|
|
|
|
|
| Benchmark | Size | roto (ns) | upb (ns) | roto speedup |
|
|
|
|
|
|
| ------------------ | ------ | --------: | ----------: | -----------: |
|
|
|
|
|
|
| `count_operations` | tiny | 50.0 | 600.2 | **12.0×** |
|
|
|
|
|
|
| `count_operations` | small | 393.7 | 22,702.9 | **57.7×** |
|
|
|
|
|
|
| `count_operations` | medium | 36,628.0 | 4,193,874.0 | **114.5×** |
|
|
|
|
|
|
| `count_all_crew` | tiny | 235.3 | 610.2 | **2.6×** |
|
|
|
|
|
|
| `count_all_crew` | small | 4,369.5 | 23,109.0 | **5.3×** |
|
|
|
|
|
|
| `count_all_crew` | medium | 444,930.0 | 4,151,181.5 | **9.3×** |
|
|
|
|
|
|
|
|
|
|
|
|
> `count_operations` includes parsing; upb's O(1) array-length read is
|
|
|
|
|
|
> dominated by its full-decode cost, so roto wins by the same margin as
|
|
|
|
|
|
> `shallow_parse`. `count_all_crew` also parses each `Operation` sub-message;
|
|
|
|
|
|
> roto's per-level scans remain cheaper than upb's full decode.
|
|
|
|
|
|
|
2026-05-04 14:40:11 -07:00
|
|
|
|
### Interpreting the comparison
|
|
|
|
|
|
|
|
|
|
|
|
The two libraries have fundamentally different models:
|
|
|
|
|
|
|
|
|
|
|
|
- **roto `shallow_parse`** does one linear scan recording byte offsets — no
|
|
|
|
|
|
allocation, no field decoding. Subsequent field reads decode on demand at
|
|
|
|
|
|
the stored offset.
|
|
|
|
|
|
- **upb `Campaign_parse`** fully decodes the entire message tree into
|
|
|
|
|
|
arena-allocated structs upfront. Subsequent field reads are direct struct
|
|
|
|
|
|
member lookups (~1 ns).
|
|
|
|
|
|
|
|
|
|
|
|
The result: roto's parse is faster and allocation-free; upb's field access
|
|
|
|
|
|
after parsing is faster. For workloads that read every field the costs
|
|
|
|
|
|
invert; for workloads that read a handful of fields from large messages roto
|
|
|
|
|
|
wins.
|
|
|
|
|
|
|
2026-05-04 13:45:18 -07:00
|
|
|
|
## Literature
|
2026-04-30 22:01:41 -07:00
|
|
|
|
|
|
|
|
|
|
https://protobuf.dev/programming-guides/encoding/
|