Skip to content

Division and Remainder Unit Documentation

Overview

This hardware module implements the division and remainder operations for the RISC-V M-Extension (div, divu, rem, remu).

To minimize silicon area, the architecture shares a single 32-bit unsigned division core that expects strictly non-negative magnitudes. Signed operations are supported by wrapping the core with low-overhead, conditional pre-processing and post-processing mathematical negation blocks.


Signal Interfaces

Input Ports

  • dataA (32-bit): Dividend input bus.
  • dataB (32-bit): Divisor input bus.
  • sel (3-bit): Control bus mapped directly from the instruction's funct3 fields.

Output Ports

  • out (32-bit): The final calculated result, routed to the CPU register file.

Control Mapping (sel / funct3)

The module uses the 3-bit sel bus to decode the exact operation and coordinate both the pre/post-processing stages and the final output selection multiplexer.

sel (funct3) Instruction Operation Type Pre/Post-Correction Output Mux Source
3'b100 div Signed Division Enabled Corrected Quotient
3'b101 divu Unsigned Division Bypassed Raw Core Quotient
3'b110 rem Signed Remainder Enabled Corrected Remainder
3'b111 remu Unsigned Remainder Bypassed Raw Core Remainder

Architectural Pipeline

Because integer division is non-linear, the two's complement representation of negative numbers cannot be easily corrected after a raw unsigned division. The division pipeline uses a three-stage topology:

1. Pre-Processing Stage (Sign Stripping)

Before entering the unsigned division core, negative signed inputs must be converted into absolute positive magnitudes. This is achieved using a conditional two's complement negation block for each input.

The architecture implements negation by performing a bitwise Exclusive-OR (XOR) with a sign-extended mask and adding the sign bit via the adder's carry-in:

\[\text{Core Input} = 0 + (\text{Data} \oplus \text{signExt}) + \text{sign}\]
  • Dividend Pre-Correction (dataA):
  • signA = dataA[31]
  • signAExt = Replicated signA across a 32-bit bus.
  • Vector fed to core: dataA_pos = 32'h0 + (dataA ^ signAExt) + signA
  • Divisor Pre-Correction (dataB):
  • signB = dataB[31]
  • signBExt = Replicated signB across a 32-bit bus.
  • Vector fed to core: dataB_pos = 32'h0 + (dataB ^ signBExt) + signB

Note: For divu and remu, the control logic forces signAExt and signBExt to zero and drops the carry-in, passing the raw inputs directly to the core.

2. Unsigned Division Core

The shared core takes dataA_pos and dataB_pos and processes the division over its fixed or variable execution cycles. It drops out two positive 32-bit results simultaneously:

  • unsigned_quotient
  • unsigned_remainder

3. Post-Processing Stage (Sign Restoration)

Once the core finishes execution, the output values pass through a final layer of conditional negation adders to apply the signed representation attributes required by the RISC-V ISA specification.

Quotient Correction (div)

The quotient must be negative if, and only if, the signs of the two original inputs differ.

  • Control Signal (signQ): Evaluated via a bitwise XOR of the input sign bits: $\(\text{signQ} = \text{signA} \oplus \text{signB}\)$
  • Correction Matrix: signQExt is generated from signQ. The final signed quotient is calculated as: $\(\text{Final Quotient} = 32'h0 + (\text{unsigned\_quotient} \oplus \text{signQExt}) + \text{signQ}\)$

Remainder Correction (rem)

Per the RISC-V standard, the remainder must always inherit the sign of the original dividend (dataA), completely independent of the divisor's sign.

  • Control Signal (signR): Inherited directly from the dividend sign: $\(\text{signR} = \text{signA}\)$
  • Correction Matrix: signRExt (which is identical to signAExt) is used to mask the remainder. The final signed remainder is calculated as: $\(\text{Final Remainder} = 32'h0 + (\text{unsigned\_remainder} \oplus \text{signRExt}) + \text{signR}\)$

Output Interconnection & Multiplexing

The results of the post-processing stages and raw unsigned outputs feed into the final execution mux layer. Based on the 3-bit sel signal, the correct 32-bit slice is driven onto the shared execution unit output bus, ensuring a uniform timing loop across both the multiplication and division routines.