Files
roto/README.md
T
charles 04ef952a58 Add Builder::with for updating messages
Allow creating a new message based on an existing one, overriding
specific fields while copying the remaining original fields.
2026-05-04 20:11:54 -07:00

289 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# roto
Zero-allocation Rust protobuf reader and writer.
## Overview
Instead of deserializing binary protobuf data into Rust structs, roto scans a message _once_ on
construction — recording the byte offset of each field — then reads fields on demand directly from
the original bytes. No heap allocation, no data copying, no full deserialization upfront.
Writing works the same way: you provide a fixed buffer and a builder writes fields directly into it,
returning a slice of the bytes written.
## Design
`protoc` generates a `CodeGeneratorRequest` message; `protoc-gen-roto` (in
`src/bin/protoc-gen-roto.rs`) reads this from stdin, generates Rust source files, and writes a
`CodeGeneratorResponse` to stdout. `protoc` then writes those `.rs` files to disk. The generated
files are included directly in the crate that uses the protobuffers.
Sample usage:
```
protoc -Iproto/ proto/hackers.proto --plugin=./target/debug/protoc-gen-roto --roto_out=src/
```
This will generate a file, src/hackers.rs.
## Generated code
For each protobuf message roto generates two types:
- **Reader struct** `MessageName<'a>` — borrows the original byte slice, zero-copy.
- **Builder struct** `MessageNameBuilder<'b>` — writes into a caller-provided `&mut [u8]`.
Nested message types are placed in a `pub mod message_name { ... }` module (snake_case of the
parent message name) within the same generated file.
## Sample usage
Given this proto definition:
```proto
message Hello {
string hello_world = 1;
message InnerWorld {
string thought = 1;
}
InnerWorld inner_world = 2;
}
```
### Reading
```rust
fn parse_proto(data: &[u8]) -> roto::Result<String> {
// Scan the data once, recording field offsets
let hello = Hello::new(data)?;
// String fields return &str borrowed from the original bytes (zero-copy)
let hello_world: &str = hello.hello_world()?;
// Nested message fields return &[u8]; construct the nested reader from those bytes
let inner_bytes: &[u8] = hello.inner_world()?;
let inner_world = hello::InnerWorld::new(inner_bytes)?;
let thought: &str = inner_world.thought()?;
Ok(format!("{} is about {}", hello_world, thought))
}
```
Fields absent from the binary data return `Err(roto::RotoError::FieldNotFound)`.
### Writing
Nested messages must be serialized into a scratch buffer first, then embedded as raw bytes in the
outer builder.
```rust
fn build_proto(buf: &mut [u8]) -> roto::Result<&[u8]> {
// Serialize the inner message first
let mut inner_buf = [0u8; 256];
let inner_bytes = hello::InnerWorldBuilder::builder(&mut inner_buf)
.thought("some thought")?
.finish()?;
// Build the outer message, embedding the serialized inner bytes
HelloBuilder::builder(buf)
.hello_world("some world")?
.inner_world(inner_bytes)?
.finish() // returns Result<&'b mut [u8]> — the written portion of buf
}
```
Builder methods consume `self` and return `Result<Self>`, enabling `?`-based chaining.
`finish()` returns `Result<&'b mut [u8]>` — a slice of the portion of the buffer that was written.
### Updating messages
You can read a message, modify specific fields, and use `.with()` to copy the remaining fields from the original binary.
```rust
fn update_proto(data: &[u8], buf: &mut [u8]) -> roto::Result<&[u8]> {
let msg = Message::new(data)?;
let mut builder = MessageBuilder::builder(buf);
if msg.foo()? == "bar" {
builder = builder.foo("foosbar")?;
}
builder.with(&msg)?.finish()
}
```
### Repeated fields
Repeated fields return a `RepeatedFieldIterator<'a>`. Each item yields `Result<(&[u8], WireType)>`.
```rust
let hello = Hello::new(data)?;
for item in hello.tags() {
let (value_bytes, _wire_type) = item?;
// decode value_bytes according to the expected wire type
}
```
## Runtime API
The core runtime in `src/lib.rs` provides:
- `ProtoAccessor<'a>` — scans a message's fields and reads values at recorded offsets.
- `ProtoBuilder<'a>` — writes fields into a provided `&mut [u8]` buffer.
- `FieldIterator<'a>` / `RepeatedFieldIterator<'a>` — iterators over fields and repeated fields.
- `Tag`, `WireType` — protobuf encoding primitives.
- `read_varint`, `write_varint`, `skip_value` — low-level wire-format helpers.
- `RotoError`, `Result<T>` — error type and alias.
## High-level design
On construction (`MessageName::new(data)`), the generated reader struct iterates the binary once
using `FieldIterator` and records the byte offset of each field's tag. Subsequent field accesses
call `ProtoAccessor::get_value_at(offset)` — no re-scanning. For repeated fields, the start and
end offsets of the field range are recorded to bound iteration efficiently.
## Benchmarks
Two benchmark suites share the same binary data files and the same four
measurement groups:
| Group | What is timed |
| --------------- | ------------------------------------------------------- |
| `shallow_parse` | Become ready to read any field (one scan / full decode) |
| `deep_parse` | Walk the full tree: Campaign → Operations → Hackers |
| `field_access` | Read individual fields on an already-parsed message |
| `iterate` | Count top-level and nested repeated fields |
### 1 — Generate the shared data files (do this once)
Data files are written to `data/bench/`.
```sh
cargo run --release --bin gen_bench_data -- --preset tiny
cargo run --release --bin gen_bench_data -- --preset small
cargo run --release --bin gen_bench_data -- --preset medium
cargo run --release --bin gen_bench_data -- --preset large
```
For even larger inputs use `--preset huge` (~500 MB) or set the knobs
directly:
```sh
# ~50 MB: 500 operations × 100 KB stolen_data each
cargo run --release --bin gen_bench_data -- --ops 500 --stolen-kb 100 --output data/bench/50mb.pb
```
### 2 — Rust benchmark (criterion)
```sh
cargo bench --bench hackers_bench
```
HTML reports are written to `target/criterion/`. Run a single group:
```sh
cargo bench --bench hackers_bench -- shallow_parse
```
### 3 — C / upb benchmark
Requires protobuf ≥ 21 with `protoc-gen-upb` (ships with modern `protoc`).
```sh
cd upb_test
make # compiles hackers_bench from the pre-generated upb files
./hackers_bench
```
To regenerate the upb C files from `proto/hackers.proto`:
```sh
cd upb_test && make regen
```
### 4 — Results
Measured on Linux x86-64 with the four standard presets. Rust times are
criterion medians; C/upb times are the custom runner's mean over ≥ 0.5 s.
#### `shallow_parse` — cost to become ready to read any field
| Size | Bytes | roto (ns) | upb (ns) | roto speedup |
| ------ | ----------: | --------: | -----------: | -----------: |
| tiny | 588 | 32.7 | 606.2 | **18.5×** |
| small | 20,265 | 182.9 | 22,619.2 | **123.7×** |
| medium | 2,071,053 | 16,632.0 | 5,346,977.2 | **321×** |
| large | 102,608,384 | 1,618.6 | 41,132,079.7 | **25,411×** |
> roto's cost is O(number of top-level fields): it records field offsets by
> jumping past nested blobs using their length prefixes. upb fully decodes the
> entire tree — including all nested messages and raw byte payloads — into
> arena-allocated structs.
#### `deep_parse` — parse + walk Campaign → Operations → every Hacker handle
| Size | Bytes | roto (ns) | upb (ns) | roto speedup |
| ------ | --------: | ----------: | ----------: | -----------: |
| tiny | 588 | 385.3 | 596.8 | **1.55×** |
| small | 20,265 | 13,374.0 | 22,321.6 | **1.67×** |
| medium | 2,071,053 | 1,454,400.0 | 4,227,384.3 | **2.91×** |
> roto pays one extra `::new()` scan per nesting level; upb's walk is pure
> pointer-chasing because everything was decoded upfront. roto is still
> faster overall because its per-level scans cost less than upb's full decode.
#### `field_access` — individual field reads on a pre-parsed message (`small` preset)
| Field | roto (ns) | upb (ns) | upb speedup |
| ------------------------------ | --------: | -------: | ----------: |
| `campaign::name` | 14.3 | 1.11 | **12.9×** |
| `campaign::total_bytes_stolen` | 7.1 | 1.74 | **4.1×** |
| `operation::codename` | 13.8 | 1.76 | **7.8×** |
| `operation::timestamp` | 9.7 | 1.40 | **6.9×** |
| `operation::successful` | 7.0 | 1.13 | **6.1×** |
| `hacker::handle` | 14.4 | 1.56 | **9.2×** |
| `hacker::skill_level` (f32) | 7.7 | 1.76 | **4.4×** |
| `hacker::is_elite` (bool) | 7.5 | 1.14 | **6.6×** |
| `worm::polymorphic` (bool) | 7.5 | 1.76 | **4.2×** |
| `worm::payload` (bytes) | 16.6 | 1.75 | **9.5×** |
> After parsing, upb field reads are direct struct-member lookups (~12 ns).
> roto re-decodes the value at its pre-recorded byte offset on every call
> (~717 ns). This is the one area where upb holds a clear advantage.
#### `iterate` — count repeated fields (parse included in every iteration)
| Benchmark | Size | roto (ns) | upb (ns) | roto speedup |
| ------------------ | ------ | --------: | ----------: | -----------: |
| `count_operations` | tiny | 50.0 | 600.2 | **12.0×** |
| `count_operations` | small | 393.7 | 22,702.9 | **57.7×** |
| `count_operations` | medium | 36,628.0 | 4,193,874.0 | **114.5×** |
| `count_all_crew` | tiny | 235.3 | 610.2 | **2.6×** |
| `count_all_crew` | small | 4,369.5 | 23,109.0 | **5.3×** |
| `count_all_crew` | medium | 444,930.0 | 4,151,181.5 | **9.3×** |
> `count_operations` includes parsing; upb's O(1) array-length read is
> dominated by its full-decode cost, so roto wins by the same margin as
> `shallow_parse`. `count_all_crew` also parses each `Operation` sub-message;
> roto's per-level scans remain cheaper than upb's full decode.
### Interpreting the comparison
The two libraries have fundamentally different models:
- **roto `shallow_parse`** does one linear scan recording byte offsets — no
allocation, no field decoding. Subsequent field reads decode on demand at
the stored offset.
- **upb `Campaign_parse`** fully decodes the entire message tree into
arena-allocated structs upfront. Subsequent field reads are direct struct
member lookups (~1 ns).
The result: roto's parse is faster and allocation-free; upb's field access
after parsing is faster. For workloads that read every field the costs
invert; for workloads that read a handful of fields from large messages roto
wins.
## Literature
https://protobuf.dev/programming-guides/encoding/