Instruction Set Architecture

The CIMFlow ISA defines the programming interface for SRAM-based Compute-in-Memory accelerators. It maps neural network operations to hardware through a hierarchical abstraction model, enabling efficient compilation and execution of deep learning workloads.

Hardware Hierarchy

The ISA models CIM hardware at three abstraction levels:

Chip

Multi-core topology with NoC and global memory

Core

Execution unit with local memory and registers

Unit

CIM arrays, vector ALU, and scalar control

Each core executes its own instruction stream and coordinates with others through explicit synchronization primitives.

Instruction Categories

Instructions follow RISC design principles with a fixed 32-bit encoding. They fall into three categories:

Compute

Matrix-vector multiplication, vector element-wise operations, and scalar arithmetic.
e.g. CIM_MVM · VEC_OP · REDUCE · SC_RR · SC_RI

Communication

Memory load/store, data movement, and inter-core send/receive.
e.g. SC_LD · SC_ST · MEM_CPY · SEND · RECV

Control Flow

Conditional branches, jumps, barriers, and synchronization tags.
e.g. BRANCH · JMP · WAIT · BARRIER · TAG

Register Files

Each core maintains two register files—see Register Files for the complete map:

General Register File (GRF): 32 registers (r0–r31) for addresses, counters, and arithmetic
Special Register File (SRF): 32 registers for CIM configuration (IDs 0–15) and vector parameters (IDs 16–31)

Design Principles

CIM-Native Operations

CIM_MVM directly drives the in-memory compute array for matrix-vector multiplication.

Explicit Parallelism

SEND/RECV for inter-core data transfer, WAIT/BARRIER/TAG for synchronization.

Uniform Encoding

32-bit instructions with 6-bit opcode across five encoding types (R, I-A, I-B, I-C, J).

Configurable Precision

SRF controls input, output, and weight bit widths for mixed-precision compute.