Division and Remainder Unit Documentation
Overview
This hardware module implements the division and remainder operations for the RISC-V M-Extension (div, divu, rem, remu).
To minimize silicon area, the architecture shares a single 32-bit unsigned division core that expects strictly non-negative magnitudes. Signed operations are supported by wrapping the core with low-overhead, conditional pre-processing and post-processing mathematical negation blocks.
Signal Interfaces
Input Ports
dataA(32-bit): Dividend input bus.dataB(32-bit): Divisor input bus.sel(3-bit): Control bus mapped directly from the instruction'sfunct3fields.
Output Ports
out(32-bit): The final calculated result, routed to the CPU register file.
Control Mapping (sel / funct3)
The module uses the 3-bit sel bus to decode the exact operation and coordinate both the pre/post-processing stages and the final output selection multiplexer.
sel (funct3) |
Instruction | Operation Type | Pre/Post-Correction | Output Mux Source |
|---|---|---|---|---|
3'b100 |
div |
Signed Division | Enabled | Corrected Quotient |
3'b101 |
divu |
Unsigned Division | Bypassed | Raw Core Quotient |
3'b110 |
rem |
Signed Remainder | Enabled | Corrected Remainder |
3'b111 |
remu |
Unsigned Remainder | Bypassed | Raw Core Remainder |
Architectural Pipeline
Because integer division is non-linear, the two's complement representation of negative numbers cannot be easily corrected after a raw unsigned division. The division pipeline uses a three-stage topology:
1. Pre-Processing Stage (Sign Stripping)
Before entering the unsigned division core, negative signed inputs must be converted into absolute positive magnitudes. This is achieved using a conditional two's complement negation block for each input.
The architecture implements negation by performing a bitwise Exclusive-OR (XOR) with a sign-extended mask and adding the sign bit via the adder's carry-in:
- Dividend Pre-Correction (
dataA): signA=dataA[31]signAExt= ReplicatedsignAacross a 32-bit bus.- Vector fed to core:
dataA_pos = 32'h0 + (dataA ^ signAExt) + signA - Divisor Pre-Correction (
dataB): signB=dataB[31]signBExt= ReplicatedsignBacross a 32-bit bus.- Vector fed to core:
dataB_pos = 32'h0 + (dataB ^ signBExt) + signB
Note: For divu and remu, the control logic forces signAExt and signBExt to zero and drops the carry-in, passing the raw inputs directly to the core.
2. Unsigned Division Core
The shared core takes dataA_pos and dataB_pos and processes the division over its fixed or variable execution cycles. It drops out two positive 32-bit results simultaneously:
unsigned_quotientunsigned_remainder
3. Post-Processing Stage (Sign Restoration)
Once the core finishes execution, the output values pass through a final layer of conditional negation adders to apply the signed representation attributes required by the RISC-V ISA specification.
Quotient Correction (div)
The quotient must be negative if, and only if, the signs of the two original inputs differ.
- Control Signal (
signQ): Evaluated via a bitwise XOR of the input sign bits: $\(\text{signQ} = \text{signA} \oplus \text{signB}\)$ - Correction Matrix:
signQExtis generated fromsignQ. The final signed quotient is calculated as: $\(\text{Final Quotient} = 32'h0 + (\text{unsigned\_quotient} \oplus \text{signQExt}) + \text{signQ}\)$
Remainder Correction (rem)
Per the RISC-V standard, the remainder must always inherit the sign of the original dividend (dataA), completely independent of the divisor's sign.
- Control Signal (
signR): Inherited directly from the dividend sign: $\(\text{signR} = \text{signA}\)$ - Correction Matrix:
signRExt(which is identical tosignAExt) is used to mask the remainder. The final signed remainder is calculated as: $\(\text{Final Remainder} = 32'h0 + (\text{unsigned\_remainder} \oplus \text{signRExt}) + \text{signR}\)$
Output Interconnection & Multiplexing
The results of the post-processing stages and raw unsigned outputs feed into the final execution mux layer. Based on the 3-bit sel signal, the correct 32-bit slice is driven onto the shared execution unit output bus, ensuring a uniform timing loop across both the multiplication and division routines.