197 lines
9.4 KiB
Markdown
197 lines
9.4 KiB
Markdown
# Rust RPG Lang
|
|
|
|
An implementation of the RPG language from IBM.
|
|
|
|
Language reference: https://www.ibm.com/docs/en/i/7.5.0?topic=introduction-overview-rpg-iv-programming-language
|
|
|
|
## Usage
|
|
|
|
### Building
|
|
|
|
```rust-langrpg/README.md
|
|
cargo build --release
|
|
```
|
|
|
|
### Running
|
|
|
|
The compiler ships as a standalone binary that loads the embedded BNF grammar, builds a parser, and runs a suite of RPG IV snippet examples to demonstrate the grammar in action:
|
|
|
|
```rust-langrpg/README.md
|
|
cargo run --bin demo
|
|
```
|
|
|
|
You will see output similar to:
|
|
|
|
```rust-langrpg/README.md
|
|
=== RPG IV Free-Format Parser ===
|
|
|
|
[grammar] Loaded successfully.
|
|
[parser] Built successfully (all non-terminals resolved).
|
|
|
|
=== Parsing Examples ===
|
|
|
|
┌─ simple identifier (identifier) ─────────────────────
|
|
│ source : "myVar"
|
|
│ result : OK
|
|
└──────────────────────────────────────────────
|
|
...
|
|
=== Summary ===
|
|
total : 42
|
|
matched : 42
|
|
failed : 0
|
|
|
|
All examples parsed successfully.
|
|
```
|
|
|
|
### Hello World in RPG IV
|
|
|
|
The following is a complete Hello World program written in RPG IV free-format syntax, as understood by this parser:
|
|
|
|
hello.rpg:
|
|
|
|
```rust-langrpg/README.md
|
|
CTL-OPT DFTACTGRP(*NO);
|
|
|
|
DCL-S greeting CHAR(25) INZ('Hello, World!');
|
|
|
|
DCL-PROC main EXPORT;
|
|
DSPLY greeting;
|
|
RETURN;
|
|
END-PROC;
|
|
```
|
|
|
|
Breaking it down:
|
|
|
|
- `CTL-OPT DFTACTGRP(*NO);` — control option spec declaring the program does not run in the default activation group
|
|
- `DCL-S greeting CHAR(25) INZ('Hello, World!');` — standalone variable declaration: a 25-character field initialised to `'Hello, World!'`
|
|
- `DCL-PROC main EXPORT; ... END-PROC;` — a procedure named `main`, exported so it can be called as a program entry point
|
|
- `DSPLY greeting;` — displays the value of `greeting` to the operator message queue
|
|
- `RETURN;` — returns from the procedure
|
|
|
|
To validate this program, execute the compiler to build the data:
|
|
|
|
```sh
|
|
cargo run --release -- -o main hello.rpg
|
|
```
|
|
|
|
## Architecture
|
|
|
|
The compiler is split across two crates in a Cargo workspace:
|
|
|
|
| Crate | Role |
|
|
|-------|------|
|
|
| `rust-langrpg` | Compiler front-end, mid-end, and LLVM back-end |
|
|
| `rpgrt` | C-compatible runtime shared library (`librpgrt.so`) |
|
|
|
|
### Compilation pipeline
|
|
|
|
```
|
|
RPG IV source (.rpg)
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ 1. BNF validation (bnf crate) │
|
|
│ src/rpg.bnf — embedded at compile │
|
|
│ time via include_str! │
|
|
└────────────────┬────────────────────────┘
|
|
│ parse tree (validation only)
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ 2. Lowering pass (src/lower.rs) │
|
|
│ Hand-written recursive-descent │
|
|
│ tokenizer + parser → typed AST │
|
|
└────────────────┬────────────────────────┘
|
|
│ ast::Program
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ 3. LLVM code generation (src/codegen.rs│
|
|
│ inkwell bindings → LLVM IR module │
|
|
└────────────────┬────────────────────────┘
|
|
│ .o object file
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ 4. Linking (cc + librpgrt.so) │
|
|
│ Produces a standalone Linux ELF │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
### Stage 1 — BNF validation (`src/rpg.bnf` + `bnf` crate)
|
|
|
|
The RPG IV free-format grammar is encoded in BNF notation in `src/rpg.bnf` and embedded at compile time with `include_str!`. At startup the compiler parses the grammar with the [`bnf`](https://docs.rs/bnf/latest/bnf/) crate to build a `GrammarParser`. Each source file is validated against the top-level `<program>` rule before any further processing. This stage acts as a gate: malformed source is rejected early with a clear parse error.
|
|
|
|
### Stage 2 — Lowering to a typed AST (`src/lower.rs`)
|
|
|
|
The BNF parser only validates structure; it does not produce a typed tree suitable for code generation. A hand-written tokenizer and recursive-descent parser in `lower.rs` converts the raw source text into the typed `Program` AST defined in `src/ast.rs`.
|
|
|
|
The AST covers the full language surface that the compiler handles:
|
|
|
|
- **Declarations** — `CTL-OPT`, `DCL-S`, `DCL-C`, `DCL-DS`, `DCL-F`, subroutines
|
|
- **Procedures** — `DCL-PROC … END-PROC` with `DCL-PI … END-PI` parameter interfaces
|
|
- **Statements** — assignment, `IF/ELSEIF/ELSE`, `DOW`, `DOU`, `FOR`, `SELECT/WHEN`, `MONITOR/ON-ERROR`, `CALLP`, `DSPLY`, `RETURN`, `LEAVE`, `ITER`, `LEAVESR`, `EXSR`, `CLEAR`, `RESET`, all I/O opcodes
|
|
- **Expressions** — literals, variables, qualified names (`ds.field`), arithmetic, logical operators, comparisons, built-in functions (`%LEN`, `%TRIM`, `%SUBST`, `%SCAN`, `%EOF`, `%SIZE`, `%ADDR`, `%SQRT`, `%ABS`, `%REM`, `%DIV`, and more)
|
|
- **Types** — `CHAR`, `VARCHAR`, `INT`, `UNS`, `FLOAT`, `PACKED`, `ZONED`, `BINDEC`, `IND`, `DATE`, `TIME`, `TIMESTAMP`, `POINTER`, `LIKE`, `LIKEDS`
|
|
|
|
Unrecognised constructs produce `Statement::Unimplemented` or placeholder declaration variants rather than hard errors, so the compiler continues to lower the parts it understands.
|
|
|
|
### Stage 3 — LLVM code generation (`src/codegen.rs`)
|
|
|
|
The typed `Program` is handed to the code generator, which uses [`inkwell`](https://crates.io/crates/inkwell) (safe Rust bindings to LLVM 21) to build an LLVM IR module:
|
|
|
|
- Each `DCL-PROC … END-PROC` becomes an LLVM function.
|
|
- An exported procedure named `main` (or the first exported procedure) is wrapped in a C `main()` entry point so the resulting binary is directly executable.
|
|
- `DCL-S` standalone variables are allocated as `alloca` stack slots inside their owning function, or as LLVM global variables for module-scope declarations.
|
|
- String literals are stored as null-terminated byte arrays in `.rodata`.
|
|
- `DSPLY expr;` is lowered to a call to `rpg_dsply(ptr, len)` (or `rpg_dsply_i64` / `rpg_dsply_f64` for numeric types) provided by the runtime library.
|
|
- Control-flow constructs (`IF`, `DOW`, `DOU`, `FOR`, `SELECT`) are lowered to LLVM basic blocks and conditional / unconditional branches.
|
|
- `LEAVE` / `ITER` are lowered to `br` to the loop-exit / loop-header block respectively, tracked via a `FnState` per function.
|
|
|
|
The module is then compiled to a native `.o` object file for the host target via LLVM's target machine API, with optional optimisation passes (`-O0` through `-O3`).
|
|
|
|
### Stage 4 — Linking
|
|
|
|
The object file is linked into a standalone ELF executable by invoking the system C compiler (`cc`). The executable is linked against `librpgrt.so`.
|
|
|
|
### Runtime library (`rpgrt/`)
|
|
|
|
`rpgrt` is a separate Cargo crate built as a `cdylib`, producing `librpgrt.so`. It is written in Rust and exports a C ABI used by compiled RPG programs:
|
|
|
|
| Symbol | Signature | Purpose |
|
|
|--------|-----------|---------|
|
|
| `rpg_dsply` | `(ptr: *const u8, len: i64)` | Display a fixed-length `CHAR` field (trims trailing spaces) |
|
|
| `rpg_dsply_cstr` | `(ptr: *const c_char)` | Display a null-terminated C string |
|
|
| `rpg_dsply_i64` | `(n: i64)` | Display a signed 64-bit integer |
|
|
| `rpg_dsply_f64` | `(f: f64)` | Display a double-precision float |
|
|
| `rpg_halt` | `(code: i32)` | Abnormal program termination |
|
|
| `rpg_memset_char` | `(ptr, len, fill)` | Fill a char buffer with a repeated byte |
|
|
| `rpg_move_char` | `(dst, dst_len, src, src_len)` | Copy between fixed-length char fields (pad / truncate) |
|
|
| `rpg_trim` | `(dst, src, src_len) -> i64` | Trim leading and trailing spaces, return trimmed length |
|
|
| `rpg_len` | `(len: i64) -> i64` | Identity — returns the static `%LEN` of a field |
|
|
| `rpg_scan` | `(needle, n_len, haystack, h_len, start) -> i64` | `%SCAN` substring search |
|
|
| `rpg_subst` | `(src, src_len, start, length, dst, dst_len)` | `%SUBST` extraction |
|
|
|
|
`DSPLY` output is written to **stdout** and flushed immediately, mirroring IBM i's interactive operator message queue format:
|
|
|
|
```/dev/null/example.txt#L1
|
|
DSPLY Hello, World!
|
|
```
|
|
|
|
### Project layout
|
|
|
|
```
|
|
rust-langrpg/
|
|
├── src/
|
|
│ ├── rpg.bnf — RPG IV free-format BNF grammar (embedded at compile time)
|
|
│ ├── lib.rs — Grammar loader and demo helpers
|
|
│ ├── ast.rs — Typed AST node definitions
|
|
│ ├── lower.rs — Tokenizer + recursive-descent lowering pass
|
|
│ ├── codegen.rs — LLVM IR code generation (inkwell)
|
|
│ ├── main.rs — Compiler CLI (clap) + linker invocation
|
|
│ └── bin/
|
|
│ └── demo.rs — Grammar demo binary
|
|
├── rpgrt/
|
|
│ └── src/
|
|
│ └── lib.rs — Runtime library (librpgrt.so)
|
|
├── hello.rpg — Hello World example program
|
|
└── count.rpg — Counting loop example program
|
|
```
|