Pipeline
5-Stage Instruction Isolation Mechanics
Pipelining maximizes execution throughput by overlapping the execution of multiple instructions. To achieve this synchronization without data collisions or signal degradation, the Risk-V datapath is explicitly partitioned into five execution stages. Each stage is strictly isolated by sequential, edge-triggered boundary registers that act as clock-cycle checkpoints.
Every pipeline register captures the output metrics and control flags of the preceding stage on the positive (rising) edge of the system clock (SysClk) and stabilizes them as static inputs for the downstream stage throughout the current clock cycle.
Stage-by-Stage Hardware Breakdown
1. Instruction Fetch (IF)
The Instruction Fetch stage calculates the next instruction address and reads the raw machine code word from memory.
- Active Hardware: Program Counter (PC) 32-bit register, stable synchronous Instruction Memory (IMem), and a dedicated combinational \(PC + 4\) Adder.
- Operational Flow: The 32-bit address inside the PC is driven directly onto the address bus of the Instruction Memory, yielding a 32-bit instruction machine code word. Concurrently, the adder calculates the sequential fallback tracking address (\(PC + 4\)).
- Downstream Targeting: Both the raw instruction word and the \(PC + 4\) tracking index settle at the entry boundary of the
IF_IDpipeline register.
2. Instruction Decode (ID)
The Instruction Decode stage untangles the instruction bitfields, checks for pipeline hazards, reads source operands, and builds sign-extended constants.
- Active Hardware: Main Control Unit decoder matrix, 32-word structural Register File, Immediate Generator, and the centralized Hazard Controller.
- Operational Flow:
- The 32-bit instruction is sliced into explicit bit ranges:
opcode(bits 6:0),rd(bits 11:7),funct3(bits 14:12),rs1(bits 19:15),rs2(bits 24:20), andfunct7(bits 31:25). - The Main Control Unit combinationally decodes the
opcodeto establish initial control flags. - The Register File performs parallel asynchronous reads on ports
rs1andrs2, outputting data onto busesRDataAandRDataB. - The Immediate Generator parses non-contiguous fragments to reconstruct a unified 32-bit sign-extended immediate field (
Imm). - Downstream Targeting: Decoded control flags, read operands, immediate scalars, and tracking indices are routed directly to the
ID_EXregister boundary.
3. Execution (EX)
The Execution stage completes arithmetic computations, evaluates branch conditions, and determines target addresses.
- Active Hardware: Core Arithmetic Logic Unit (ALU), source multiplexer arrays (
ASelandBSel), Branch Control Unit, and the Forwarding Unit. - Operational Flow:
- The Forwarding Unit continuously evaluates active downstream register writes against current execution sources (
rs1/rs2). If a data dependency is identified, forwarding multiplexers dynamically swap stale register data with live bypass values from theEX_MEMorMEM_WBregisters. - Multiplexers
ASelandBSelfinalize the core ALU inputs (e.g., selecting between bypassed register data, the active PC tracking value, or the sign-extended immediate). - The ALU processes the inputs based on the 5-bit
ALUSelopcode to generateALURes. - The Branch Control Unit evaluates conditions (e.g., equality, signed comparison) to determine if a branch is taken (
Branch_Taken). - Downstream Targeting: The computed result (
ALURes), forwarded store data (MemWData), target destination index (rdi), and remaining memory/writeback control flags land at the input of theEX_MEMregister.
4. Memory Access (MEM)
The Memory Access stage coordinates reads and writes with physical volatile storage cells.
- Active Hardware: Data Memory (DMem) core, combinational Store Aligner, and Load Aligner.
- Operational Flow:
- The incoming
ALUResis mapped straight to the Data Memory address bus. - If
MemWriteis asserted high, the Store Aligner formats the data payload (MemWData) into appropriate byte lanes based on the instruction width spec (MemByteSel) before triggering the RAM cells. - If
MemReadis asserted high, a data word is retrieved from the RAM cells, and the Load Aligner sign-extends or zero-pads the output according to the targeted load size format (byte, halfword, or full word). - Downstream Targeting: Sized read data (
MemRData), bypassed ALU outcomes (ALURes), return links (\(PC + 4\)), and target writeback metrics settle at theMEM_WBregister inputs.
5. Write Back (WB)
The terminal Write Back stage selects and routes the finalized data payload to commit updates back into the architectural register file.
- Active Hardware: Write Back Controller multiplexer steering matrix.
- Operational Flow: The Write Back Controller acts as a multi-channel selection tree. It evaluates the 2-bit
WBSeltracking flag to steer a single 32-bit data path out of three available resource channels: 00: Direct ALU calculation bypass (ALURes).01: Sanitized memory read output (MemRData).11: Sequential link return address (\(PC + 4\)).- Loopback Targeting: The selected data path loops back across the full horizontal layout of the processor schematic, terminating directly at the write port (
BusW) of the Register File in the Decode stage, authorized on the clock edge by the latched register write enable flag (RegWEn).
Global Structural Timing Diagram
The diagram below maps out how independent instructions move through the decoupled execution stages across five successive clock cycles under normal pipeline operation:
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Inst 0: [ IF ] --> [ ID ] --> [ EX ] --> [ MEM ] --> [ WB ]
Inst 1: [ IF ] --> [ ID ] --> [ EX ] --> [ MEM ] --> [ WB ]
Inst 2: [ IF ] --> [ ID ] --> [ EX ] --> [ MEM ]
Inst 3: [ IF ] --> [ ID ] --> [ EX ]
Inst 4: [ IF ] --> [ ID ]
Boundary Register Micro-Logic Specifications
The pipeline relies on four discrete boundary register blocks to maintain steady state isolation. Each block handles stall and flush signals uniquely using input-side multiplexing and label tunnels.
1. IF/ID Register (components/pipeline/if-id.md)
- Inputs Captured:
In_PC(32 bits),In_Inst(32 bits) - Outputs Delivered:
Out_PC(32 bits),Out_Inst(32 bits) - Control Mechanics:
- Controlled by the active-high write-enable flag
IF_ID_Writeand active-high clear flagIF_ID_Flush. - If
IF_ID_Flush == 1\(\rightarrow\) Outputs instantly clear on the clock edge (Out_PC = 0x00000000,Out_Inst = 0x00000013[Hardware NOP]). - If
IF_ID_Flush == 0andIF_ID_Write == 0\(\rightarrow\) Clock updates are masked, locking the internal state to execute a pipeline stall.
2. ID/EX Register (components/pipeline/id-ex.md)
- Inputs Captured: Execution, Memory, and Writeback control bundles, source/destination tracking indices (
rs1,rs2,rdi), read payloads (RDataA,RDataB), immediate scalar (Imm), and address track (IF_ID_PC). - Outputs Delivered: Symmetrical pipeline-prefixed versions of all inputs (e.g.,
ID_EX_ALUSel,ID_EX_RDataA). - Control Mechanics: Mapped directly to write-enable
ID_EX_WEand flush vectorID_EX_FLUSH. Activating a flush isolates the execution stage by forcing all outgoing downstream control lines synchronously to0, transforming the instruction into a harmless pipeline bubble.
3. EX/MEM Register (components/pipeline/ex-mem.md)
- Inputs Captured: Memory and Writeback control lines, ALU outcome (
ALURes), forwarded store data (MemWData), and destination index (rdi). - Outputs Delivered:
EX_MEM_prefixed tracking buses. - Control Mechanics: Mapped to write enable
EX_MEM_WEand synchronous clearEX_MEM_FLUSH. Wipes trailing memory read/write commands if late exceptions occur during execution.
4. MEM/WB Register (components/pipeline/mem-wb.md)
- Inputs Captured: Writeback control lines (
RegWEn,WBSel), destination pointer (rdi), memory output (MemRData), arithmetic result (ALURes), and link path (PC_4). - Outputs Delivered: Terminal
MEM_WB_prefixed commit lines. - Control Mechanics: Controlled by
MEM_WB_WEandMEM_WB_FLUSH. This register stabilizes data returns and loops target write metrics (MEM_WB_rdi,MEM_WB_RegWEn) back to the forwarding and hazard modules to maintain hazard safety.