# Rust RPG Lang An implementation of the RPG language from IBM. Language reference: https://www.ibm.com/docs/en/i/7.5.0?topic=introduction-overview-rpg-iv-programming-language ## Usage ### Building ```rust-langrpg/README.md cargo build --release ``` ### Compiling an RPG IV program ```rust-langrpg/README.md cargo run --release -- -o hello hello.rpg ./hello ``` You will see output similar to: ```rust-langrpg/README.md DSPLY Hello, World! ``` ### Hello World in RPG IV The following is a complete Hello World program written in RPG IV free-format syntax: `hello.rpg`: ```rust-langrpg/README.md CTL-OPT DFTACTGRP(*NO); DCL-S greeting CHAR(25) INZ('Hello, World!'); DCL-PROC main EXPORT; DSPLY greeting; RETURN; END-PROC; ``` Breaking it down: - `CTL-OPT DFTACTGRP(*NO);` — control option spec declaring the program does not run in the default activation group - `DCL-S greeting CHAR(25) INZ('Hello, World!');` — standalone variable declaration: a 25-character field initialised to `'Hello, World!'` - `DCL-PROC main EXPORT; ... END-PROC;` — a procedure named `main`, exported so it can be called as a program entry point - `DSPLY greeting;` — displays the value of `greeting` to the operator message queue - `RETURN;` — returns from the procedure ### Compiler options ``` rust-langrpg [OPTIONS] ... Arguments: ... RPG IV source file(s) to compile Options: -o Output executable path [default: a.out] --emit-ir Print LLVM IR to stdout instead of producing a binary -O Optimisation level 0-3 [default: 0] --no-link Produce a .o object file, skip linking --runtime Path to librpgrt.so [default: auto-detect] -h, --help Print help -V, --version Print version ``` ## Architecture The compiler is split across two crates in a Cargo workspace: | Crate | Role | |-------|------| | `rust-langrpg` | Compiler front-end, mid-end, and LLVM back-end | | `rpgrt` | C-compatible runtime shared library (`librpgrt.so`) | ### Compilation pipeline ``` RPG IV source (.rpg) │ ▼ ┌─────────────────────────────────────────┐ │ 1. Parsing + lowering (src/lower.rs) │ │ Hand-written tokenizer + │ │ recursive-descent parser │ │ → typed AST (src/ast.rs) │ └────────────────┬────────────────────────┘ │ ast::Program ▼ ┌─────────────────────────────────────────┐ │ 2. LLVM code generation (src/codegen.rs│ │ inkwell bindings → LLVM IR module │ └────────────────┬────────────────────────┘ │ .o object file ▼ ┌─────────────────────────────────────────┐ │ 3. Linking (cc + librpgrt.so) │ │ Produces a standalone Linux ELF │ └─────────────────────────────────────────┘ ``` ### Stage 1 — Parsing and lowering to a typed AST (`src/lower.rs`) A hand-written tokenizer and recursive-descent parser converts the raw source text directly into the typed `Program` AST defined in `src/ast.rs`. RPG IV keywords are case-insensitive and the parser handles mixed-case source naturally. The AST covers the full language surface that the compiler handles: - **Declarations** — `CTL-OPT`, `DCL-S`, `DCL-C`, `DCL-DS`, `DCL-F`, subroutines - **Procedures** — `DCL-PROC … END-PROC` with `DCL-PI … END-PI` parameter interfaces - **Statements** — assignment, `IF/ELSEIF/ELSE`, `DOW`, `DOU`, `FOR`, `SELECT/WHEN`, `MONITOR/ON-ERROR`, `CALLP`, `DSPLY`, `RETURN`, `LEAVE`, `ITER`, `LEAVESR`, `EXSR`, `CLEAR`, `RESET`, all I/O opcodes - **Expressions** — literals, variables, qualified names (`ds.field`), arithmetic, logical operators, comparisons, built-in functions (`%LEN`, `%TRIM`, `%SUBST`, `%SCAN`, `%EOF`, `%SIZE`, `%ADDR`, `%SQRT`, `%ABS`, `%REM`, `%DIV`, and more) - **Types** — `CHAR`, `VARCHAR`, `INT`, `UNS`, `FLOAT`, `PACKED`, `ZONED`, `BINDEC`, `IND`, `DATE`, `TIME`, `TIMESTAMP`, `POINTER`, `LIKE`, `LIKEDS` Unrecognised constructs produce `Statement::Unimplemented` or placeholder declaration variants rather than hard errors, so the compiler continues to lower the parts it understands. ### Stage 2 — LLVM code generation (`src/codegen.rs`) The typed `Program` is handed to the code generator, which uses [`inkwell`](https://crates.io/crates/inkwell) (safe Rust bindings to LLVM 21) to build an LLVM IR module: - Each `DCL-PROC … END-PROC` becomes an LLVM function. - An exported procedure named `main` (or the first exported procedure) is wrapped in a C `main()` entry point so the resulting binary is directly executable. - `DCL-S` standalone variables are allocated as `alloca` stack slots inside their owning function, or as LLVM global variables for module-scope declarations. - String literals are stored as null-terminated byte arrays in `.rodata`. - `DSPLY expr;` is lowered to a call to `rpg_dsply(ptr, len)` (or `rpg_dsply_i64` / `rpg_dsply_f64` for numeric types) provided by the runtime library. - Control-flow constructs (`IF`, `DOW`, `DOU`, `FOR`, `SELECT`) are lowered to LLVM basic blocks and conditional / unconditional branches. - `LEAVE` / `ITER` are lowered to `br` to the loop-exit / loop-header block respectively, tracked via a `FnState` per function. The module is then compiled to a native `.o` object file for the host target via LLVM's target machine API, with optional optimisation passes (`-O0` through `-O3`). ### Stage 3 — Linking The object file is linked into a standalone ELF executable by invoking the system C compiler (`cc`). The executable is linked against `librpgrt.so`. ### Runtime library (`rpgrt/`) `rpgrt` is a separate Cargo crate built as a `cdylib`, producing `librpgrt.so`. It is written in Rust and exports a C ABI used by compiled RPG programs: | Symbol | Signature | Purpose | |--------|-----------|---------| | `rpg_dsply` | `(ptr: *const u8, len: i64)` | Display a fixed-length `CHAR` field (trims trailing spaces) | | `rpg_dsply_cstr` | `(ptr: *const c_char)` | Display a null-terminated C string | | `rpg_dsply_i64` | `(n: i64)` | Display a signed 64-bit integer | | `rpg_dsply_f64` | `(f: f64)` | Display a double-precision float | | `rpg_halt` | `(code: i32)` | Abnormal program termination | | `rpg_memset_char` | `(ptr, len, fill)` | Fill a char buffer with a repeated byte | | `rpg_move_char` | `(dst, dst_len, src, src_len)` | Copy between fixed-length char fields (pad / truncate) | | `rpg_trim` | `(dst, src, src_len) -> i64` | Trim leading and trailing spaces, return trimmed length | | `rpg_len` | `(len: i64) -> i64` | Identity — returns the static `%LEN` of a field | | `rpg_scan` | `(needle, n_len, haystack, h_len, start) -> i64` | `%SCAN` substring search | | `rpg_subst` | `(src, src_len, start, length, dst, dst_len)` | `%SUBST` extraction | `DSPLY` output is written to **stdout** and flushed immediately, mirroring IBM i's interactive operator message queue format: ```/dev/null/example.txt#L1 DSPLY Hello, World! ``` ### Project layout ``` rust-langrpg/ ├── src/ │ ├── lib.rs — Library root (re-exports ast, lower, codegen) │ ├── ast.rs — Typed AST node definitions │ ├── lower.rs — Tokenizer + recursive-descent parser + lowering pass │ ├── codegen.rs — LLVM IR code generation (inkwell) │ └── main.rs — Compiler CLI (clap) + linker invocation ├── rpgrt/ │ └── src/ │ └── lib.rs — Runtime library (librpgrt.so) ├── hello.rpg — Hello World example program └── fib.rpg — Fibonacci sequence example program ```