Matrix–Vector Operation
Computing-in-memory matrix-vector multiplication instructions
Matrix–vector operations leverage the CIM array for efficient parallel multiplication between weight matrices and input vectors, enabling high-performance deep learning inference.
CIM_MVM
Performs matrix-vector multiplication (MVM) using the CIM array, computing the product of a weight matrix stored in CIM and an input feature vector from local memory. Supports batch operations via configurable flag bits.
CIM_MVM rs, rt, re, rf, [flags]CIM[GRF[re]] × MEM[GRF[rs]:GRF[rt]]Operation Flags
The flags field controls execution modes for CIM matrix-vector multiplication. Multiple flags can be combined to optimize computation.
Group ModeGroup Input ModeBatch ProcessingAccumulation (ACC) is always enabled by default - results are accumulated to the output buffer. This flag is implicit and not exposed in assembly syntax.
Examples
; Example 1: Basic matrix-vector multiplication
; y = W × x, where W is 128×256, x is 256×1
G_LI r1, 0x1000 ; Input vector address
G_LI r2, 256 ; Input vector length
G_LI r3, 0x0 ; Weight matrix in CIM[0]
G_LI r4, 1 ; Single operation
CIM_MVM r1, r2, r3, r4 ; Result stored in CIM output buffer
; Example 2: Batch processing
; Process 16 input vectors
G_LI r1, 0x2000 ; First input vector
G_LI r2, 512 ; Vector length
G_LI r3, 0x1000 ; Weight matrix in CIM[0x1000]
G_LI r4, 16 ; Batch size = 16
CIM_MVM r1, r2, r3, r4, BATCH ; Batch processing mode
; Example 3: Grouped computation
; Use grouped mode for parallel processing
G_LI r1, 0x3000 ; Input address
G_LI r2, 1024 ; Vector length
G_LI r3, 0x0 ; Weight matrix
G_LI r4, 1 ; Single operation
CIM_MVM r1, r2, r3, r4, GRP ; Grouped computation
; Example 4: Multi-layer inference
; Sequential MVM for 3 layers
S_LI INPUT_BITWIDTH, 8 ; Configure CIM: INT8 input
S_LI OUTPUT_BITWIDTH, 32 ; INT32 output
; Layer 1: 784 → 512
G_LI r1, 0x1000
G_LI r2, 784
G_LI r3, 0x0
G_LI r4, 1
CIM_MVM r1, r2, r3, r4
; Apply activation (omitted for brevity)
; Layer 2: 512 → 256
G_LI r1, 0x2000 ; Layer 1 output
G_LI r2, 512
G_LI r3, 0x10000 ; Layer 2 weights
G_LI r4, 1
CIM_MVM r1, r2, r3, r4
; Layer 3: 256 → 10
G_LI r1, 0x3000 ; Layer 2 output
G_LI r2, 256
G_LI r3, 0x20000 ; Layer 3 weights
G_LI r4, 1
CIM_MVM r1, r2, r3, r4
; Example 5: Batch inference for throughput
; Process 32 images in a batch
G_LI r1, 0x10000 ; First image features
G_LI r2, 784 ; Feature dimension
G_LI r3, 0x0 ; Weight matrix
G_LI r4, 32 ; Batch count
CIM_MVM r1, r2, r3, r4, BATCH ; Batch processingLast updated on