9.4 KiB
Rust RPG Lang
An implementation of the RPG language from IBM.
Language reference: https://www.ibm.com/docs/en/i/7.5.0?topic=introduction-overview-rpg-iv-programming-language
Usage
Building
cargo build --release
Running
The compiler ships as a standalone binary that loads the embedded BNF grammar, builds a parser, and runs a suite of RPG IV snippet examples to demonstrate the grammar in action:
cargo run --bin demo
You will see output similar to:
=== RPG IV Free-Format Parser ===
[grammar] Loaded successfully.
[parser] Built successfully (all non-terminals resolved).
=== Parsing Examples ===
┌─ simple identifier (identifier) ─────────────────────
│ source : "myVar"
│ result : OK
└──────────────────────────────────────────────
...
=== Summary ===
total : 42
matched : 42
failed : 0
All examples parsed successfully.
Hello World in RPG IV
The following is a complete Hello World program written in RPG IV free-format syntax, as understood by this parser:
hello.rpg:
CTL-OPT DFTACTGRP(*NO);
DCL-S greeting CHAR(25) INZ('Hello, World!');
DCL-PROC main EXPORT;
DSPLY greeting;
RETURN;
END-PROC;
Breaking it down:
CTL-OPT DFTACTGRP(*NO);— control option spec declaring the program does not run in the default activation groupDCL-S greeting CHAR(25) INZ('Hello, World!');— standalone variable declaration: a 25-character field initialised to'Hello, World!'DCL-PROC main EXPORT; ... END-PROC;— a procedure namedmain, exported so it can be called as a program entry pointDSPLY greeting;— displays the value ofgreetingto the operator message queueRETURN;— returns from the procedure
To validate this program, execute the compiler to build the data:
cargo run --release -- -o main hello.rpg
Architecture
The compiler is split across two crates in a Cargo workspace:
| Crate | Role |
|---|---|
rust-langrpg |
Compiler front-end, mid-end, and LLVM back-end |
rpgrt |
C-compatible runtime shared library (librpgrt.so) |
Compilation pipeline
RPG IV source (.rpg)
│
▼
┌─────────────────────────────────────────┐
│ 1. BNF validation (bnf crate) │
│ src/rpg.bnf — embedded at compile │
│ time via include_str! │
└────────────────┬────────────────────────┘
│ parse tree (validation only)
▼
┌─────────────────────────────────────────┐
│ 2. Lowering pass (src/lower.rs) │
│ Hand-written recursive-descent │
│ tokenizer + parser → typed AST │
└────────────────┬────────────────────────┘
│ ast::Program
▼
┌─────────────────────────────────────────┐
│ 3. LLVM code generation (src/codegen.rs│
│ inkwell bindings → LLVM IR module │
└────────────────┬────────────────────────┘
│ .o object file
▼
┌─────────────────────────────────────────┐
│ 4. Linking (cc + librpgrt.so) │
│ Produces a standalone Linux ELF │
└─────────────────────────────────────────┘
Stage 1 — BNF validation (src/rpg.bnf + bnf crate)
The RPG IV free-format grammar is encoded in BNF notation in src/rpg.bnf and embedded at compile time with include_str!. At startup the compiler parses the grammar with the bnf crate to build a GrammarParser. Each source file is validated against the top-level <program> rule before any further processing. This stage acts as a gate: malformed source is rejected early with a clear parse error.
Stage 2 — Lowering to a typed AST (src/lower.rs)
The BNF parser only validates structure; it does not produce a typed tree suitable for code generation. A hand-written tokenizer and recursive-descent parser in lower.rs converts the raw source text into the typed Program AST defined in src/ast.rs.
The AST covers the full language surface that the compiler handles:
- Declarations —
CTL-OPT,DCL-S,DCL-C,DCL-DS,DCL-F, subroutines - Procedures —
DCL-PROC … END-PROCwithDCL-PI … END-PIparameter interfaces - Statements — assignment,
IF/ELSEIF/ELSE,DOW,DOU,FOR,SELECT/WHEN,MONITOR/ON-ERROR,CALLP,DSPLY,RETURN,LEAVE,ITER,LEAVESR,EXSR,CLEAR,RESET, all I/O opcodes - Expressions — literals, variables, qualified names (
ds.field), arithmetic, logical operators, comparisons, built-in functions (%LEN,%TRIM,%SUBST,%SCAN,%EOF,%SIZE,%ADDR,%SQRT,%ABS,%REM,%DIV, and more) - Types —
CHAR,VARCHAR,INT,UNS,FLOAT,PACKED,ZONED,BINDEC,IND,DATE,TIME,TIMESTAMP,POINTER,LIKE,LIKEDS
Unrecognised constructs produce Statement::Unimplemented or placeholder declaration variants rather than hard errors, so the compiler continues to lower the parts it understands.
Stage 3 — LLVM code generation (src/codegen.rs)
The typed Program is handed to the code generator, which uses inkwell (safe Rust bindings to LLVM 21) to build an LLVM IR module:
- Each
DCL-PROC … END-PROCbecomes an LLVM function. - An exported procedure named
main(or the first exported procedure) is wrapped in a Cmain()entry point so the resulting binary is directly executable. DCL-Sstandalone variables are allocated asallocastack slots inside their owning function, or as LLVM global variables for module-scope declarations.- String literals are stored as null-terminated byte arrays in
.rodata. DSPLY expr;is lowered to a call torpg_dsply(ptr, len)(orrpg_dsply_i64/rpg_dsply_f64for numeric types) provided by the runtime library.- Control-flow constructs (
IF,DOW,DOU,FOR,SELECT) are lowered to LLVM basic blocks and conditional / unconditional branches. LEAVE/ITERare lowered tobrto the loop-exit / loop-header block respectively, tracked via aFnStateper function.
The module is then compiled to a native .o object file for the host target via LLVM's target machine API, with optional optimisation passes (-O0 through -O3).
Stage 4 — Linking
The object file is linked into a standalone ELF executable by invoking the system C compiler (cc). The executable is linked against librpgrt.so.
Runtime library (rpgrt/)
rpgrt is a separate Cargo crate built as a cdylib, producing librpgrt.so. It is written in Rust and exports a C ABI used by compiled RPG programs:
| Symbol | Signature | Purpose |
|---|---|---|
rpg_dsply |
(ptr: *const u8, len: i64) |
Display a fixed-length CHAR field (trims trailing spaces) |
rpg_dsply_cstr |
(ptr: *const c_char) |
Display a null-terminated C string |
rpg_dsply_i64 |
(n: i64) |
Display a signed 64-bit integer |
rpg_dsply_f64 |
(f: f64) |
Display a double-precision float |
rpg_halt |
(code: i32) |
Abnormal program termination |
rpg_memset_char |
(ptr, len, fill) |
Fill a char buffer with a repeated byte |
rpg_move_char |
(dst, dst_len, src, src_len) |
Copy between fixed-length char fields (pad / truncate) |
rpg_trim |
(dst, src, src_len) -> i64 |
Trim leading and trailing spaces, return trimmed length |
rpg_len |
(len: i64) -> i64 |
Identity — returns the static %LEN of a field |
rpg_scan |
(needle, n_len, haystack, h_len, start) -> i64 |
%SCAN substring search |
rpg_subst |
(src, src_len, start, length, dst, dst_len) |
%SUBST extraction |
DSPLY output is written to stdout and flushed immediately, mirroring IBM i's interactive operator message queue format:
DSPLY Hello, World!
Project layout
rust-langrpg/
├── src/
│ ├── rpg.bnf — RPG IV free-format BNF grammar (embedded at compile time)
│ ├── lib.rs — Grammar loader and demo helpers
│ ├── ast.rs — Typed AST node definitions
│ ├── lower.rs — Tokenizer + recursive-descent lowering pass
│ ├── codegen.rs — LLVM IR code generation (inkwell)
│ ├── main.rs — Compiler CLI (clap) + linker invocation
│ └── bin/
│ └── demo.rs — Grammar demo binary
├── rpgrt/
│ └── src/
│ └── lib.rs — Runtime library (librpgrt.so)
├── hello.rpg — Hello World example program
└── count.rpg — Counting loop example program