CIMFlow LogoCIMFlow

Vector Operation

SIMD vector operations and reduction instructions

Vector operations enable Single Instruction, Multiple Data (SIMD) processing for efficient parallel computation on arrays. These instructions support element-wise operations, activation functions, quantization, and aggregation across vector elements.


VEC_OP

Performs SIMD operations on 1-4 input vectors, applying the same operation to corresponding elements in parallel. Supports various functions including activation functions (ReLU), quantization, and arithmetic operations.

31:26
25:21
20:16
15:11
10:6
5:0
01XY00
opcode
rs
input 1
rt
input 2
rd
output
re
length
funct
operation
Syntax
VEC_OP_Z rs, rt, rd, re
Operation
MEM[GRF[rd]] ← funct(inputs, GRF[re])

Input Vector Modes

In the instruction syntax, Z determines the number of input vectors. Z corresponds to the opcode bits XY:

010000Z=0
1 input vector
Unary operations
010100Z=1
2 input vectors
Binary operations
011000Z=2
3 input vectors
Ternary operations
011100Z=3
4 input vectors
Quaternary ops

For operations with >2 input vectors, addresses for vectors 3 and 4 are sourced from dedicated vector registers configured separately.

Function Codes

Complete list of SIMD vector operation function codes (funct field) as defined in the implementation:

VEC_ADD0
Vector addition
Element-wise addition
VEC_SC_ADD1
Scalar addition
Add scalar to vector
VEC_MUL2
Vector multiplication
Element-wise multiplication
VEC_QUANTIZE3
Quantization
Convert to lower precision
VEC_RESADD_QUANTIZE4
Residual add + quantize
Fused operation
VEC_RESMUL_QUANTIZE5
Residual mul + quantize
Fused operation
VEC_VVMAX6
Vector maximum
Element-wise maximum
VEC_VSMUL7
Vector-scalar multiply
Multiply vector by scalar
VEC_VFLOOR8
Vector floor
Floor operation
VEC_VSET9
Vector set
Set vector values
VEC_SOFTMAX10
Softmax activation
Normalized exponential
VEC_REDUCE_MAX11
Reduce maximum
Find max element
VEC_V_EXP12
Vector exponential
e^x operation
VEC_REDUCE_SUM13
Reduce sum
Sum all elements
VEC_VS_DIV14
Vector-scalar divide
Divide vector by scalar
VEC_VS_SUB15
Vector-scalar subtract
Subtract scalar from vector
VEC_SQRT16
Vector square root
Element-wise sqrt
VEC_GELU17
GELU activation
Gaussian Error Linear Unit

REDUCE

Performs reduction operations that aggregate vector elements into a single scalar value. Commonly used for computing sums, finding extrema, or other aggregate statistics across arrays.

31:26
25:21
20:16
15:11
10:6
5:0
010001
opcode
rs
input addr
rt
length
rd
output addr
0
reserved
funct
operation
Syntax
REDUCE_OP rs, rt, rd
Operation
MEM[GRF[rd]] ← reduce(MEM[GRF[rs]:GRF[rt]], funct)

Reduction Functions

REDUCE_MAX000000
Reduce to maximum value
out = max(in[0..n-1])
REDUCE_SUM000001
Reduce to sum
out = Σ in[0..n-1]

Reduction operations produce a single scalar output regardless of input vector size, enabling efficient aggregation computations.


Examples

; Vector add (2 inputs)
G_LI r1, 0x1000           ; Vector a address
G_LI r2, 0x2000           ; Vector b address
G_LI r3, 0x3000           ; Output vector address
G_LI r4, 256              ; Vector length
VEC_ADD_1 r3, r1, r2, r4  ; input_num=2 (Z=1)

; GELU activation (unary, placeholder r0)
G_LI r5, 0x4000           ; Input vector
G_LI r6, 0x5000           ; Output vector
G_LI r7, 256              ; Length
VEC_GELU_0 r6, r5, r0, r7 ; input_num=1 (Z=0)

; Quantize to INT8 (unary)
G_LI r8, 0x6000           ; FP32 input
G_LI r9, 0x7000           ; INT8 output
G_LI r10, 512             ; Length
VEC_QUANTIZE_0 r9, r8, r0, r10 ; input_num=1 (Z=0)

; Softmax with max reduction
G_LI r11, 0x8000          ; Logits vector
G_LI r12, 256             ; Classes/length
G_LI r13, 0x8100          ; Temp scalar addr for max
REDUCE_MAX r13, r11, r12
G_LI r14, 0x8200          ; Output probabilities
VEC_SOFTMAX_0 r14, r11, r0, r12 ; input_num=1 (Z=0)

; Vector-scalar multiply
G_LI r15, 0x9000          ; Input vector
G_LI r16, 0xA000          ; Output vector
G_LI r17, 128             ; Length
G_LI r18, 0x3F800000      ; Scalar in GRF (1.0 as bits)
VEC_VSMUL_1 r16, r15, r18, r17 ; input_num=2 (Z=1)

; Reduce sum to scalar
G_LI r19, 0xB000          ; Input vector
G_LI r20, 512             ; Length
G_LI r21, 0xB800          ; Output scalar addr
REDUCE_SUM r21, r19, r20

; Element-wise max of two vectors
G_LI r22, 0xC000          ; Vector a
G_LI r23, 0xD000          ; Vector b
G_LI r24, 0xE000          ; Output vector
G_LI r25, 256             ; Length
VEC_VVMAX_1 r24, r22, r23, r25 ; input_num=2 (Z=1)

Last updated on