EECS 4340 · Spring 2026 · Columbia University
Out-of-Order RV32IM Processor
A synthesizable, P6-style 2-way superscalar out-of-order RISC-V processor in SystemVerilog. Built on top of the Project 3 in-order pipeline, with seven advanced features layered on the base machine.
Programs passing
33 / 33
RTL + synthesized netlist
Geomean cycle reduction
-27.46%
OoO base → all-on
Geomean branch accuracy
74.03%
+8.87 pp over bimodal
Worst slack at 1000 ps
-797.58 ps
functionally bit-equivalent
Architecture
Click any module to read what it does. Toggle the pills to highlight the advanced features in their host modules.
Per-program results
Cycle-count change from disabling all five ablate-able advanced features (OoO base) to running all seven (all-on). Hover any bar for details.
Seven advanced features
Two from the difficult tier and five from the simpler tier, layered on top of the base out-of-order pipeline.
2-way Superscalar
difficultTwo-wide fetch, decode, dispatch, and commit. Doubles the IPC ceiling.
Structural — see deep dive
Early Tag Broadcast
difficultMULT wakes its dependents one cycle before the result lands on the CDB.
Disabling raises geomean cycles by +0.10%
gshare Predictor
simplerXOR-folded global history with PC bits to specialize on hot branches.
Disabling raises geomean cycles by +0.19%
Return Address Stack
simpler16-entry hardware stack. Returns predicted by where they were called.
Disabling raises geomean cycles by +0.10%
Store-to-Load Forwarding
simplerA load behind a fully-covering older store completes from the LSQ.
Disabling raises geomean cycles by +0.20%
Next-line Prefetch
simplerOne-line stream buffer fetches line N+1 in parallel on a miss for line N.
Disabling raises geomean cycles by +37.14%
2-way Set-Associative D-Cache
simpler16 sets × 2 ways with 1-bit LRU. Resolves direct-mapped conflict misses.
Structural — see deep dive