32-Bit Unified Adder/Subtractor Module

Functional Overview

The 32-Bit Unified Adder/Subtractor (AdderSub32) handles signed and unsigned addition and subtraction within the execution path. The module unifies both operations into a single 32-bit adder network rather than duplicating hardware structures.

To perform subtraction, the circuit converts the subtrahend into its two's complement form using a parallel bitwise one's complement inversion layer and a decoupled carry-in control path. This architecture processes subtraction natively and maintains modularity for cascading multi-word operations.

Architectural Mechanics & Exterior Two's Complement Execution

To evaluate subtraction ($A - B$), the arithmetic path implements the standard two's complement identity:

\[A - B = A + \overline{B} + 1\]

The hardware separates the bitwise logical negation phase from the final mathematical carry increment step:

The Programmable Inverter Array: A single 32-bit wide XOR gate array serves as a conditional bitwise inverter on the $B$ operand path.
When Sub = 0, the XOR gates act as transparent buffers, passing the operand unmodified ($B \oplus 0 = B$) to execute standard addition.
When Sub = 1, the XOR gates dynamically invert the bits ($B \oplus 1 = \overline{B}$), fulfilling the one's complement inversion required for subtraction.
Decoupled Carry-In Path: The carry-in ($C_{\text{in}}$) pin of the internal 32-bit ripple chain passes directly to the baseline bit stage ($Bit_0$) without modification or logical masking.

Operational Mapping (Caller Responsibilities)

Because the internal carry path is transparent, the external calling control logic coordinates standalone and multi-word cascaded executions:

Standalone Addition: The control unit drives Sub = 0 and Cin = 0.
Standalone Subtraction: The control unit drives Sub = 1 and routes the same control signal into Cin = 1, satisfying the $+1$ requirement for two's complement mathematical validity.
Multi-Word Cascading (64-Bit Precision): The upper-significant 32-bit module sets its Sub line to match the global operation type, while its Cin line directly captures the raw, unmodified Cout cascading from the lower-significant 32-bit module.

Intended Operation	`Sub` Line	Ext `Cin`	XOR Array ($B$)	Internal $C_0$	True Arithmetic Expression
ADD (Standalone)	`0`	`0`	Unchanged ($B$)	`0`	$A + B + 0$
ADC (64-bit Add Chain)	`0`	`1`	Unchanged ($B$)	`1`	$A + B + 1$
SUB (Standalone)	`1`	`1`	Inverted ($\overline{B}$)	`1`	$A + \overline{B} + 1$
SBB (64-bit Sub Chain)	`1`	$Cout_{\text{low}}$	Inverted ($\overline{B}$)	$Cout_{\text{low}}$	$A + \overline{B} + Cout_{\text{low}}$

Interface Specifications

Input Signals

Pin Name	Bit Width	Direction	Functional Description
`A[31:0]`	32	Input	Word-level Augend/Minuend vector.
`B[31:0]`	32	Input	Word-level Addend/Subtrahend vector.
`Sub`	1	Input	Operational control signal (`0` = Pass $B$ raw, `1` = Invert $B$ bitwise).
`Cin`	1	Input	Carry-in stream routed straight to the baseline bit layer ($Bit_0$).

Output Signals

Pin Name	Bit Width	Direction	Functional Description
`Result[31:0]`	32	Output	Word-level arithmetic transformation output vector.
`Cout`	1	Output	Final carry-out trailing from the highest bit layer ($Bit_{31}$).

Mathematical & Structural Formulation

The internal core maps a linear structural progression. For any bit index $i$ in the 32-bit vector array, the localized equations match the hardware constraints of the instantiated Full Adder cell:

\[\text{Sum}_i = A_i \oplus B'_i \oplus C_i$$ $$C_{i+1} = (A_i \cdot B'_i) + (B'_i \cdot C_i) + (A_i \cdot C_i)\]

Where:

$B'_i = B_i \oplus \text{Sub}$
$C_0 = \text{Cin}$
$C_{32} = \text{Cout}$

Hardware Optimization Analysis (XOR vs. MUX)

An alternative design pattern utilizes a static NOT gate array coupled with a 32-bit 2-to-1 Multiplexer layer to choose between the raw $B$ bus and the inverted $\overline{B}$ bus. The conditional XOR matrix implementation was selected due to specific layout and performance advantages:

Gate Count Minimization: A standard multiplexer implementation demands two AND gates, one NOT gate, and an OR gate per bit slice, on top of the discrete negation array. Conversely, an XOR gate operates as an atomic cell, restricting total transistor scaling and routing density across the wide 32-bit processing bus.
Shallow Propagation Depth: The XOR matrix achieves both data propagation and conditional logical inversion in a single gate layer, bypassing the sequential delays introduced by separate negation arrays and selection multiplexers.

Architectural Comparison & Design Trade-offs

To evaluate this baseline execution path against alternative designs, the matrix below details the performance, hardware area, and layout complexity profiles of different 32-bit adder topologies.

Adder Topology	Propagation Delay (Time Complexity)	Hardware Footprint (Area Complexity)	Logisim Layout / Wiring Complexity	Trade-off Analysis & Architectural Notes
Ripple Carry (RCA) (Current Design)	$O(N)$ (32 Gate Delays)	$O(N)$ (Minimal)	Very Low (Highly uniform, linear cascading chain)	Selected Baseline: Smallest silicon area and cleanest routing profile. Linear delay ($O(N)$) creates a critical path bottleneck for high-frequency execution since the highest bits must stall until the lower carry signals stabilize.
Carry Look-Ahead (CLA)	$O(\log N)$	$O(N \log N)$	High (Complex parallel logic networks)	Evaluates all carries simultaneously using parallel generate ($G$) and propagate ($P$) equations. Drastically accelerates computation speed, but flat 32-bit implementation is constrained by high gate fan-in requirements, necessitating multi-level 4-bit grouped block schemes.
Carry Select (CSelA)	$O(\sqrt{N})$	$O(2N)$	Medium-High (Duplicated cells + wide MUX arrays)	Computes dual 32-bit addition variations simultaneously (one array assuming `0` carry-in, another assuming `1`). When the true carry-in stabilizes, high-speed multiplexers select the correct answer. Cuts delay at the cost of doubling hardware area.
Kogge-Stone (Parallel Prefix)	$O(\log N)$	$O(N \log N)$	Extreme (Massive, overlapping wire trees)	A highly optimized parallel prefix tree implementation that represents the industry standard for industrial, high-frequency ALUs. Achieves the lowest absolute latency profile but introduces a complex web of overlapping signal nets that makes manual schematic layout impractical.
Carry Skip / Bypass	$O(\sqrt{N})$	$O(N)$	Medium (Standard RCA cells + bypass AND MUXing)	Identifies blocks of bits where a carry is guaranteed to propagate through completely, allowing the carry bit to skip entire multi-bit stages through a bypass multiplexer lane. Modest performance boost over RCA with minimal extra gating area.

Half Adder Primitive Component

Functional Overview

The Half Adder (HA) is the foundational, single-bit combinational building block for arithmetic logic within the processor architecture. It handles the arithmetic addition of two single-bit binary inputs. Because it lacks an incoming carry input ($C_{\text{in}}$), it represents the localized starting point of binary addition before cascading logic is introduced.

Interface Specifications

Input Signals

Pin Name	Bit Width	Direction	Functional Description
`A`	1	Input	Augend bit (Operand A, bit position $i$)
`B`	1	Input	Addend bit (Operand B, bit position $i$)

Output Signals

Pin Name	Bit Width	Direction	Functional Description
`Sum`	1	Output	The localized binary sum of inputs `A` and `B`
`Carry`	1	Output	The overflow bit generated when both inputs are asserted ($2_{10}$ or $10_2$)

Mathematical & Boolean Formulations

The Half Adder resolves its outputs strictly using the current evaluation cycle's inputs.

The Sum Column ($Sum$)

The sum output tracks whether the total number of asserted inputs is odd. It is mapped via a two-input Exclusive-OR (XOR) logic gate: $$Sum = A \oplus B$$

The Carry-Out Column ($Carry$)

The carry output acts as a logical mask that detects when both input bits are high, demanding an arithmetic carry to the next power-of-two significance column. It is mapped via a two-input AND logic gate: $$Carry = A \cdot B$$

Truth Table Matrix

Input `A`	Input `B`	Output `Carry`	Output `Sum`	Binary Representation	Equivalent Decimal Evaluation
`0`	`0`	`0`	`0`	`00`	$0 + 0 = 0$
`0`	`1`	`0`	`1`	`01`	$0 + 1 = 1$
`1`	`0`	`0`	`1`	`01`	$1 + 0 = 1$
`1`	`1`	`1`	`0`	`10`	$1 + 1 = 2$

Full Adder Primitive Component

Functional Overview

The Full Adder (FA) is a 1-bit combinational circuit designed to perform the arithmetic addition of three binary inputs: two operand bits and a carry-in bit cascaded from a lower-order significance column. It serves as the foundational core for word-level vector adders, allowing carry propagation across parallel execution stages.

Architecture Selection Trade-Offs

While a Full Adder can be constructed structurally by chaining two existing HalfAdder modules back-to-back with an OR gate, this design implements a flat, optimized Standard Gate-Level Layout (SOP-optimized) instead.

Why the Structural Two-Half-Adder Design Was Rejected

Propagation Delay Bottleneck: In a multi-bit ripple carry chain, the critical path is dictated by the time it takes for the carry signal to propagate from the lowest bit to the highest bit. Chaining two half adders forces the carry-out signal to traverse multiple nested sub-circuit boundaries and sequential gating paths, increasing total gate propagation latency.
Optimized Carry Path: The standard gate-level approach resolves the critical carry-out pathway ($C_{\text{out}}$) in parallel. By evaluating all input combinations across a single shallow layer of parallel 2-input AND gates feeding directly into a 3-input OR gate, the propagation delay is minimized, making it significantly faster when scaled to a full 32-bit width.

Interface Specifications

Input Signals

Pin Name	Bit Width	Direction	Functional Description
`A`	1	Input	Augend bit (Operand A, bit position $i$)
`B`	1	Input	Addend bit (Operand B, bit position $i$)
`Cin`	1	Input	Carry-in bit cascading from the previous lower-significance column ($i-1$)

Output Signals

Pin Name	Bit Width	Direction	Functional Description
`Sum`	1	Output	The localized binary sum of inputs `A`, `B`, and `Cin`
`Cout`	1	Output	The carry-out bit generated to overflow into the next higher-significance column ($i+1$)

Mathematical & Boolean Formulations

The Standard Gate-Level Layout optimizes execution paths by deriving outputs directly from the primary inputs using parallel Boolean expressions.

The Sum Logic Path

The binary sum represents an odd parity check across the three input variables, resolved using cascaded 2-input Exclusive-OR (XOR) logic gates: $$Sum = A \oplus B \oplus Cin$$

The Carry-Out Logic Path

The carry-out is asserted if at least two of the three inputs are high. This is evaluated using a parallel Sum-of-Products (SOP) configuration: $$Cout = (A \cdot B) + (B \cdot Cin) + (A \cdot Cin)$$

Truth Table Matrix

Input `A`	Input `B`	Input `Cin`	Output `Cout`	Output `Sum`	Equivalent Decimal Evaluation
`0`	`0`	`0`	`0`	`0`	$0 + 0 + 0 = 0$
`0`	`1`	`0`	`0`	`1`	$0 + 1 + 0 = 1$
`1`	`0`	`0`	`0`	`1`	$1 + 0 + 0 = 1$
`1`	`1`	`0`	`1`	`0`	$1 + 1 + 0 = 2$
`0`	`0`	`1`	`0`	`1`	$0 + 0 + 1 = 1$
`0`	`1`	`1`	`1`	`0`	$0 + 1 + 1 = 2$
`1`	`0`	`1`	`1`	`0`	$1 + 0 + 1 = 2$
`1`	`1`	`1`	`1`	`1`	$1 + 1 + 1 = 3$

32-Bit Unified Adder/Subtractor Module

Functional Overview

Architectural Mechanics & Exterior Two's Complement Execution

Operational Mapping (Caller Responsibilities)

Interface Specifications

Input Signals

Output Signals

Mathematical & Structural Formulation

Hardware Optimization Analysis (XOR vs. MUX)

Architectural Comparison & Design Trade-offs

Half Adder Primitive Component

Functional Overview

Interface Specifications

Input Signals

Output Signals

Mathematical & Boolean Formulations

The Sum Column (\(Sum\))

The Carry-Out Column (\(Carry\))

Truth Table Matrix

Full Adder Primitive Component

Functional Overview

Architecture Selection Trade-Offs

Why the Structural Two-Half-Adder Design Was Rejected

Interface Specifications

Input Signals

Output Signals

Mathematical & Boolean Formulations

The Sum Logic Path

The Carry-Out Logic Path

Truth Table Matrix

Intended Operation	`Sub` Line	Ext `Cin`	XOR Array (\(B\))	Internal \(C_0\)	True Arithmetic Expression
ADD (Standalone)	`0`	`0`	Unchanged (\(B\))	`0`	\(A + B + 0\)
ADC (64-bit Add Chain)	`0`	`1`	Unchanged (\(B\))	`1`	\(A + B + 1\)
SUB (Standalone)	`1`	`1`	Inverted (\(\overline{B}\))	`1`	\(A + \overline{B} + 1\)
SBB (64-bit Sub Chain)	`1`	\(Cout_{\text{low}}\)	Inverted (\(\overline{B}\))	\(Cout_{\text{low}}\)	\(A + \overline{B} + Cout_{\text{low}}\)

Adder Topology	Propagation Delay (Time Complexity)	Hardware Footprint (Area Complexity)	Logisim Layout / Wiring Complexity	Trade-off Analysis & Architectural Notes
Ripple Carry (RCA) (Current Design)	\(O(N)\) (32 Gate Delays)	\(O(N)\) (Minimal)	Very Low (Highly uniform, linear cascading chain)	Selected Baseline: Smallest silicon area and cleanest routing profile. Linear delay (\(O(N)\)) creates a critical path bottleneck for high-frequency execution since the highest bits must stall until the lower carry signals stabilize.
Carry Look-Ahead (CLA)	\(O(\log N)\)	\(O(N \log N)\)	High (Complex parallel logic networks)	Evaluates all carries simultaneously using parallel generate (\(G\)) and propagate (\(P\)) equations. Drastically accelerates computation speed, but flat 32-bit implementation is constrained by high gate fan-in requirements, necessitating multi-level 4-bit grouped block schemes.
Carry Select (CSelA)	\(O(\sqrt{N})\)	\(O(2N)\)	Medium-High (Duplicated cells + wide MUX arrays)	Computes dual 32-bit addition variations simultaneously (one array assuming `0` carry-in, another assuming `1`). When the true carry-in stabilizes, high-speed multiplexers select the correct answer. Cuts delay at the cost of doubling hardware area.
Kogge-Stone (Parallel Prefix)	\(O(\log N)\)	\(O(N \log N)\)	Extreme (Massive, overlapping wire trees)	A highly optimized parallel prefix tree implementation that represents the industry standard for industrial, high-frequency ALUs. Achieves the lowest absolute latency profile but introduces a complex web of overlapping signal nets that makes manual schematic layout impractical.
Carry Skip / Bypass	\(O(\sqrt{N})\)	\(O(N)\)	Medium (Standard RCA cells + bypass AND MUXing)	Identifies blocks of bits where a carry is guaranteed to propagate through completely, allowing the carry bit to skip entire multi-bit stages through a bypass multiplexer lane. Modest performance boost over RCA with minimal extra gating area.

Pin Name	Bit Width	Direction	Functional Description
`A`	1	Input	Augend bit (Operand A, bit position \(i\))
`B`	1	Input	Addend bit (Operand B, bit position \(i\))

Input `A`	Input `B`	Output `Carry`	Output `Sum`	Binary Representation	Equivalent Decimal Evaluation
`0`	`0`	`0`	`0`	`00`	\(0 + 0 = 0\)
`0`	`1`	`0`	`1`	`01`	\(0 + 1 = 1\)
`1`	`0`	`0`	`1`	`01`	\(1 + 0 = 1\)
`1`	`1`	`1`	`0`	`10`	\(1 + 1 = 2\)

Input `A`	Input `B`	Input `Cin`	Output `Cout`	Output `Sum`	Equivalent Decimal Evaluation
`0`	`0`	`0`	`0`	`0`	\(0 + 0 + 0 = 0\)
`0`	`1`	`0`	`0`	`1`	\(0 + 1 + 0 = 1\)
`1`	`0`	`0`	`0`	`1`	\(1 + 0 + 0 = 1\)
`1`	`1`	`0`	`1`	`0`	\(1 + 1 + 0 = 2\)
`0`	`0`	`1`	`0`	`1`	\(0 + 0 + 1 = 1\)
`0`	`1`	`1`	`1`	`0`	\(0 + 1 + 1 = 2\)
`1`	`0`	`1`	`1`	`0`	\(1 + 0 + 1 = 2\)
`1`	`1`	`1`	`1`	`1`	\(1 + 1 + 1 = 3\)