Add benchmark

This commit is contained in:
2026-05-04 14:40:11 -07:00
parent b03ec9eba9
commit 4a6a09cff1
18 changed files with 3922 additions and 12 deletions
+83
View File
@@ -18,6 +18,14 @@ returning a slice of the bytes written.
`CodeGeneratorResponse` to stdout. `protoc` then writes those `.rs` files to disk. The generated
files are included directly in the crate that uses the protobuffers.
Sample usage:
```
protoc -Iproto/ proto/hackers.proto --plugin=./target/debug/protoc-gen-roto --roto_out=src/
```
This will generate a file, src/hackers.rs.
## Generated code
For each protobuf message roto generates two types:
@@ -117,6 +125,81 @@ using `FieldIterator` and records the byte offset of each field's tag. Subsequen
call `ProtoAccessor::get_value_at(offset)` — no re-scanning. For repeated fields, the start and
end offsets of the field range are recorded to bound iteration efficiently.
## Benchmarks
Two benchmark suites share the same binary data files and the same four
measurement groups:
| Group | What is timed |
| --------------- | ------------------------------------------------------- |
| `shallow_parse` | Become ready to read any field (one scan / full decode) |
| `deep_parse` | Walk the full tree: Campaign → Operations → Hackers |
| `field_access` | Read individual fields on an already-parsed message |
| `iterate` | Count top-level and nested repeated fields |
### 1 — Generate the shared data files (do this once)
Data files are written to `data/bench/`.
```sh
cargo run --release --bin gen_bench_data -- --preset tiny
cargo run --release --bin gen_bench_data -- --preset small
cargo run --release --bin gen_bench_data -- --preset medium
cargo run --release --bin gen_bench_data -- --preset large
```
For even larger inputs use `--preset huge` (~500 MB) or set the knobs
directly:
```sh
# ~50 MB: 500 operations × 100 KB stolen_data each
cargo run --release --bin gen_bench_data -- --ops 500 --stolen-kb 100 --output data/bench/50mb.pb
```
### 2 — Rust benchmark (criterion)
```sh
cargo bench --bench hackers_bench
```
HTML reports are written to `target/criterion/`. Run a single group:
```sh
cargo bench --bench hackers_bench -- shallow_parse
```
### 3 — C / upb benchmark
Requires protobuf ≥ 21 with `protoc-gen-upb` (ships with modern `protoc`).
```sh
cd upb_test
make # compiles hackers_bench from the pre-generated upb files
./hackers_bench
```
To regenerate the upb C files from `proto/hackers.proto`:
```sh
cd upb_test && make regen
```
### Interpreting the comparison
The two libraries have fundamentally different models:
- **roto `shallow_parse`** does one linear scan recording byte offsets — no
allocation, no field decoding. Subsequent field reads decode on demand at
the stored offset.
- **upb `Campaign_parse`** fully decodes the entire message tree into
arena-allocated structs upfront. Subsequent field reads are direct struct
member lookups (~1 ns).
The result: roto's parse is faster and allocation-free; upb's field access
after parsing is faster. For workloads that read every field the costs
invert; for workloads that read a handful of fields from large messages roto
wins.
## Literature
https://protobuf.dev/programming-guides/encoding/