Add benchmark

2026-05-04 14:40:11 -07:00
parent b03ec9eba9
commit 4a6a09cff1
18 changed files with 3922 additions and 12 deletions
@@ -18,6 +18,14 @@ returning a slice of the bytes written.
 `CodeGeneratorResponse` to stdout. `protoc` then writes those `.rs` files to disk. The generated
 files are included directly in the crate that uses the protobuffers.

+Sample usage:
+
+```
+protoc -Iproto/ proto/hackers.proto --plugin=./target/debug/protoc-gen-roto --roto_out=src/
+```
+
+This will generate a file, src/hackers.rs.
+
 ## Generated code

 For each protobuf message roto generates two types:
@@ -117,6 +125,81 @@ using `FieldIterator` and records the byte offset of each field's tag. Subsequen
 call `ProtoAccessor::get_value_at(offset)` — no re-scanning. For repeated fields, the start and
 end offsets of the field range are recorded to bound iteration efficiently.

+## Benchmarks
+
+Two benchmark suites share the same binary data files and the same four
+measurement groups:
+
+| Group           | What is timed                                           |
+| --------------- | ------------------------------------------------------- |
+| `shallow_parse` | Become ready to read any field (one scan / full decode) |
+| `deep_parse`    | Walk the full tree: Campaign → Operations → Hackers     |
+| `field_access`  | Read individual fields on an already-parsed message     |
+| `iterate`       | Count top-level and nested repeated fields              |
+
+### 1 — Generate the shared data files (do this once)
+
+Data files are written to `data/bench/`.
+
+```sh
+cargo run --release --bin gen_bench_data -- --preset tiny
+cargo run --release --bin gen_bench_data -- --preset small
+cargo run --release --bin gen_bench_data -- --preset medium
+cargo run --release --bin gen_bench_data -- --preset large
+```
+
+For even larger inputs use `--preset huge` (~500 MB) or set the knobs
+directly:
+
+```sh
+# ~50 MB: 500 operations × 100 KB stolen_data each
+cargo run --release --bin gen_bench_data -- --ops 500 --stolen-kb 100 --output data/bench/50mb.pb
+```
+
+### 2 — Rust benchmark (criterion)
+
+```sh
+cargo bench --bench hackers_bench
+```
+
+HTML reports are written to `target/criterion/`. Run a single group:
+
+```sh
+cargo bench --bench hackers_bench -- shallow_parse
+```
+
+### 3 — C / upb benchmark
+
+Requires protobuf ≥ 21 with `protoc-gen-upb` (ships with modern `protoc`).
+
+```sh
+cd upb_test
+make          # compiles hackers_bench from the pre-generated upb files
+./hackers_bench
+```
+
+To regenerate the upb C files from `proto/hackers.proto`:
+
+```sh
+cd upb_test && make regen
+```
+
+### Interpreting the comparison
+
+The two libraries have fundamentally different models:
+
+- **roto `shallow_parse`** does one linear scan recording byte offsets — no
+  allocation, no field decoding. Subsequent field reads decode on demand at
+  the stored offset.
+- **upb `Campaign_parse`** fully decodes the entire message tree into
+  arena-allocated structs upfront. Subsequent field reads are direct struct
+  member lookups (~1 ns).
+
+The result: roto's parse is faster and allocation-free; upb's field access
+after parsing is faster. For workloads that read every field the costs
+invert; for workloads that read a handful of fields from large messages roto
+wins.
+
 ## Literature

 https://protobuf.dev/programming-guides/encoding/