cuda-oxide:hello-constant 拆解 03——rustc 前端把源码变成 MIR

上一篇(cargo-oxide driver)看完了进程 A,这一篇进入进程 B——rustc 加载我们的 backend 之前,自己干了些什么。rustc 前端不是 cuda-oxide 的代码,但理解它产出什么,才能理解 backend 拿到的”原料”长什么样。

1. rustc 前端的五段流水线

src/main.rs 到 MIR 之间,rustc 走了五段:

src/main.rs (文本源码)
    │
    ▼ ① lexer            字符流 → token: fn, hello_constant, (, out, :, ...
    │
    ▼ ② parser           token → AST(抽象语法树)
    │
    ▼ ③ name resolution + macro expansion
                         解析名字、展开宏(#[kernel]、gpu_printf!)
    │
    ▼ ④ HIR (High-level IR)
                         desugar:for → loop、if let → match、? 操作符展开
    │
    ▼ ⑤ type checking + borrow checking
                         推类型、查所有权、查生命周期
    │
    ▼ ⑥ THIR → MIR       降到基本块 + 终结符的 CFG(控制流图)
    │
    ▼ ⑦ MIR optimizations
                         一些 MIR 级别的优化(dataflow、const eval)
    │
   MIR  ←── 我们的 backend 在这里拿到输入
    │
    ▼ ⑧ Monomorphization (codegen 阶段)
   MIR (已单态化的具体实例)
    │
    ▼ ⑨ codegen_crate(tcx)  ←── librustc_codegen_cuda.so 接管

关键点:步骤 ①-⑦ 跟普通 Rust 编译完全相同。我们 backend 唯一介入点是 codegen_crate(tcx) 这一个 hook。

2. MIR 长什么样

本节例子全部取自真实 dump(CUDA_OXIDE_DUMP_MIR=1 cargo oxide run hello_constant 的输出),不是我编的简化版。

2.1 最小例子:`hello_kernel`

源码:

pub unsafe fn hello_kernel(ins: i32, out: *mut i32) {
    *out = ins + 1;
}

rustc dump 出来的 MIR(原样):

fn hello_kernel {
    let mut _0: ();
    let _1:  i32;
    let _2:  *mut i32;
    debug "ins" => _1;
    debug "out" => _2;
    bb0: {
        (*_2) = Add(copy _1, const 1_i32)
        return
    }
}

七行就够了。注意几个细节:

_0 是返回值槽,即使是 () 也声明
_1 _2 是参数,debug "ins" => _1 是 rustc 保留的源码名映射
statement 和 terminator 都不带分号——这是 rustc MIR pretty-print 的格式,不是 Rust 源码
Add(copy _1, const 1_i32) 是 Rvalue,不是函数调用——所以是 statement 不是 terminator

2.2 复杂一点:`hello_constant`

源码:

pub unsafe fn hello_constant(out: *mut i32) {
    let xxx = thread::xxx();
    gpu_printf!("thread xxx: {}", xxx);
    unsafe { *out = 42 };
}

真实 MIR 一共 3 个 basic block(摘要,省略中间繁琐的 StorageLive 等):

fn hello_constant {
    let mut _0: ();
    let _1:  *mut i32;
    let _2:  u32;
    let _3:  i32;
    ...
    debug "out" => _1;
    debug "xxx" => _2;
    ...
    bb0: {
        _2 = cuda_device::xxx() -> [return: bb1, unwind continue]
    }
    bb1: {
        StorageLive(_3)
        ...
        _3 = cuda_device::debug::__gpu_vprintf(move _6, move _8) -> [return: bb2, unwind continue]
    }
    bb2: {
        StorageDead(_8)
        ...
        (*_1) = const 42_i32
        return
    }
}

三行源码翻译成三个 block。每个函数调用就是一个 block 边界——thread::xxx() 切了 bb0/bb1,__gpu_vprintf(...) 切了 bb1/bb2。下一节解释为什么。

2.3 自己 dump

想自己跑一遍看真实数据:

RUST_LOG=info CUDA_OXIDE_DUMP_MIR=1 cargo oxide run hello_constant 2>&1 | tee /tmp/run.log

输出落在 stderr,grep fn hello_constant 跳到那一段。

3. MIR 的三个核心概念

3.1 Local(局部变量编号)

_0  ──  返回值
_1  ──  第 1 个参数
_2  ──  第 2 个参数
...
_n  ──  其它中间变量

MIR 里所有变量都用编号代替名字。源码里 let xxx = ... 这种命名信息只在 debug info 里保留,控制流分析时只看编号。

3.2 Basic Block + Terminator

每个函数体被切成若干个基本块 bb0, bb1, bb2, ...:

bb0: {
    statement_1
    statement_2
    statement_3
    terminator     ← 块的最后一行必须是终结符,没有分号
}

终结符决定控制流走向。按真实 dump 里的格式列举:

Terminator	真实格式举例	含义
Call	`_3 = some_fn(move _4) -> [return: bb6, unwind continue]`	函数调用,正常返回到 bb6,unwind 时按 `continue` action 处理
Goto	`goto -> bb3`	无条件跳到 bb3
SwitchInt	`switchInt(_2) -> [0: bb1, otherwise: bb2]`	多路分支(if、match 编译后)
Return	`return`	函数返回
Unreachable	`unreachable`	编译器证明不可达(`unreachable_unchecked()`、match 全 cover)
Drop	`drop(_4) -> [return: bb5, unwind continue]`	调用 drop glue
Assert	`assert(_3, "overflow") -> [success: bb4, unwind continue]`	检查(数组越界、整数溢出)

unwind 后面跟的是 UnwindAction 枚举值,合法值:

UnwindAction	含义
`continue`	unwind 时向上传播给调用者
`unreachable`	编译器证明不会 unwind(GPU kernel 常见)
`cleanup -> bbN`	unwind 时去 bbN 跑 drop glue
`terminate(reason)`	unwind 时直接 abort

注意:unwind continue 中间没有冒号,跟 return: 那个冒号不一样——因为 return: 后面跟的是 BasicBlock id(label 形式),unwind 后面跟的是 enum 值(token 形式)。

这个 block + terminator 的结构叫 CFG(Control Flow Graph)——编译器里最常见的中间表示形式。

3.3 Place 和 Rvalue

(*_1) = const 42_i32
 ↑      ↑
 Place  Rvalue

Place:可以被写入的位置,比如 _2、(*_1)、(*_1).field0、_3[42]
Rvalue:能产生一个值的表达式,比如 const 42_i32、Add(copy _1, const 1_i32)、Ref(_4)、copy _2

每条 statement 都是一个 Place = Rvalue,没有任何复合表达式(a + b + c 会被拆成两条 statement)。

对照 hello_kernel 的真实 dump,可以一一指出:

(*_2) = Add(copy _1, const 1_i32)
 ↑      ↑
 Place  Rvalue::BinaryOp(Add, copy _1, const 1)

copy _1 和 const 1_i32 都是 Operand(操作数,Rvalue 内部用),不是顶层的 Place/Rvalue。MIR 的层次:Statement > Rvalue > Operand > Place / Constant。

4. 为什么 MIR 是好的”输入”

设计 backend 时如果可以选,从下面四个里挑一个 IR 接入:

候选输入	评价
AST	还有 `match` / `for` / 闭包 / 模式匹配等高级结构,每种都要单独处理
HIR	类型还没完全推完,需要自己跟 `TyCtxt` 打交道
MIR ✓	类型已推完、借用已检完、控制流已简化成 CFG、表达式已拆扁——backend 只需翻译”局部变量 + 基本块”
LLVM IR	已经丢了 Rust 特有信息(drop glue、move 语义、`const T` vs `mut T`)

MIR 是个精心设计的”中间过渡点”:

保留足够的 Rust 信息让 backend 知道在干嘛
又简化到 CFG 的程度让 backend 容易处理

5. backend 拿到的 MIR 已经做完的事

到 codegen_crate 被调用时,rustc 保证:

保证	含义
类型已推断	`_1` 一定带具体类型 `*mut i32`,不会是 `_`
借用已检查	不会有非法的 `&mut`
trait 已解析	`Iterator::next` 已知道是哪个 impl
泛型已单态化	`Vec<T>` 已实例化成 `Vec<i32>`
闭包已 desugar	闭包变成具名 struct + impl
语法糖已展开	`for` `while let` `?` 等已展开

剩下的就一件事:把 MIR 翻译成目标平台代码。

对默认 rustc:LLVM IR → 机器码
对我们的 backend:dialect-mir → dialect-llvm → LLVM IR → PTX

6. 为什么不在 backend 里跑前端

直接的问题:rustc 前端要不要自己重写一个简化版?

绝对不要。三条理由:

理由	说明
rustc 前端 = 几十万行高质量代码	重写没意义
类型推断 / 借用检查极其复杂	重写必然出 bug
Rust 语义在更新	每个 stable 可能改语法,跟着 rustc 走就自动跟上

cuda-oxide 整个 backend 才几万行代码(mir-importer + mir-lower + dialect-* 加起来),这是因为它只做”翻译”,所有”理解”留给 rustc。

这是 rustc plugin 生态的核心杠杆点——用极小的代码量,撬动 rustc 全套语言能力。

7. 一句话总结

rustc 前端把源码变成 MIR(CFG of basic blocks),全程跟普通 Rust 编译一样。我们的 backend 通过 codegen_crate(tcx) 这个 hook 进入,拿到的输入是已经类型检查 / 借用检查 / 单态化完毕的 MIR——一个”恰到好处”的过渡点:既保留 Rust 语义,又简化成 CFG 便于翻译。

下一篇会进入我们的代码:librustc_codegen_cuda.so 被 rustc dlopen 之后,第一个函数调用是 __rustc_codegen_backend() 返回 backend 对象,之后 rustc 会调它的 codegen_crate(tcx) 方法——我们会拆开 lib.rs 里那段代码,看 host 路径 + device 路径双轨制怎么实现。

系列上一篇: cuda-oxide:hello-constant 拆解 02——cargo-oxide driver 进程做了什么

评论区

评论功能即将上线, 敬请期待。