Vector Operation
SIMD vector operations and reduction instructions
Vector operations enable Single Instruction, Multiple Data (SIMD) processing for efficient parallel computation on arrays. These instructions support element-wise operations, activation functions, quantization, and aggregation across vector elements.
VEC_OP
Performs SIMD operations on 1-4 input vectors, applying the same operation to corresponding elements in parallel. Supports various functions including activation functions (ReLU), quantization, and arithmetic operations.
VEC_OP_Z rs, rt, rd, reMEM[GRF[rd]] ← funct(inputs, GRF[re])Input Vector Modes
In the instruction syntax, Z determines the number of input vectors.
Z corresponds to the opcode bits XY:
Z=0Z=1Z=2Z=3For operations with >2 input vectors, addresses for vectors 3 and 4 are sourced from dedicated vector registers configured separately.
Function Codes
Complete list of SIMD vector operation function codes (funct field) as defined in the implementation:
01234567891011121314151617REDUCE
Performs reduction operations that aggregate vector elements into a single scalar value. Commonly used for computing sums, finding extrema, or other aggregate statistics across arrays.
REDUCE_OP rs, rt, rdMEM[GRF[rd]] ← reduce(MEM[GRF[rs]:GRF[rt]], funct)Reduction Functions
000000000001Reduction operations produce a single scalar output regardless of input vector size, enabling efficient aggregation computations.
Examples
; Vector add (2 inputs)
G_LI r1, 0x1000 ; Vector a address
G_LI r2, 0x2000 ; Vector b address
G_LI r3, 0x3000 ; Output vector address
G_LI r4, 256 ; Vector length
VEC_ADD_1 r3, r1, r2, r4 ; input_num=2 (Z=1)
; GELU activation (unary, placeholder r0)
G_LI r5, 0x4000 ; Input vector
G_LI r6, 0x5000 ; Output vector
G_LI r7, 256 ; Length
VEC_GELU_0 r6, r5, r0, r7 ; input_num=1 (Z=0)
; Quantize to INT8 (unary)
G_LI r8, 0x6000 ; FP32 input
G_LI r9, 0x7000 ; INT8 output
G_LI r10, 512 ; Length
VEC_QUANTIZE_0 r9, r8, r0, r10 ; input_num=1 (Z=0)
; Softmax with max reduction
G_LI r11, 0x8000 ; Logits vector
G_LI r12, 256 ; Classes/length
G_LI r13, 0x8100 ; Temp scalar addr for max
REDUCE_MAX r13, r11, r12
G_LI r14, 0x8200 ; Output probabilities
VEC_SOFTMAX_0 r14, r11, r0, r12 ; input_num=1 (Z=0)
; Vector-scalar multiply
G_LI r15, 0x9000 ; Input vector
G_LI r16, 0xA000 ; Output vector
G_LI r17, 128 ; Length
G_LI r18, 0x3F800000 ; Scalar in GRF (1.0 as bits)
VEC_VSMUL_1 r16, r15, r18, r17 ; input_num=2 (Z=1)
; Reduce sum to scalar
G_LI r19, 0xB000 ; Input vector
G_LI r20, 512 ; Length
G_LI r21, 0xB800 ; Output scalar addr
REDUCE_SUM r21, r19, r20
; Element-wise max of two vectors
G_LI r22, 0xC000 ; Vector a
G_LI r23, 0xD000 ; Vector b
G_LI r24, 0xE000 ; Output vector
G_LI r25, 256 ; Length
VEC_VVMAX_1 r24, r22, r23, r25 ; input_num=2 (Z=1)Last updated on