Vector Operation

Vector operations enable Single Instruction, Multiple Data (SIMD) processing for efficient parallel computation on arrays. These instructions support element-wise operations, activation functions, quantization, and aggregation across vector elements.

VEC_OP

Performs SIMD operations on 1-4 input vectors, applying the same operation to corresponding elements in parallel. Supports various functions including activation functions (ReLU), quantization, and arithmetic operations.

31:26

25:21

20:16

15:11

10:6

5:0

01XY00

opcode

input 1

input 2

output

length

funct

operation

Syntax

VEC_OP_Z rs, rt, rd, re

Operation

MEM[GRF[rd]] ← funct(inputs, GRF[re])

Input Vector Modes

In the instruction syntax, Z determines the number of input vectors. Z corresponds to the opcode bits XY:

010000Z=0

1 input vector

Unary operations

010100Z=1

2 input vectors

Binary operations

011000Z=2

3 input vectors

Ternary operations

011100Z=3

4 input vectors

Quaternary ops

For operations with >2 input vectors, addresses for vectors 3 and 4 are sourced from dedicated vector registers configured separately.

Function Codes

Complete list of SIMD vector operation function codes (funct field) as defined in the implementation:

VEC_ADD0

Vector addition

Element-wise addition

VEC_SC_ADD1

Scalar addition

Add scalar to vector

VEC_MUL2

Vector multiplication

Element-wise multiplication

VEC_QUANTIZE3

Quantization

Convert to lower precision

VEC_RESADD_QUANTIZE4

Residual add + quantize

Fused operation

VEC_RESMUL_QUANTIZE5

Residual mul + quantize

Fused operation

VEC_VVMAX6

Vector maximum

Element-wise maximum

VEC_VSMUL7

Vector-scalar multiply

Multiply vector by scalar

VEC_VFLOOR8

Vector floor

Floor operation

VEC_VSET9

Vector set

Set vector values

VEC_SOFTMAX10

Softmax activation

Normalized exponential

VEC_REDUCE_MAX11

Reduce maximum

Find max element

VEC_V_EXP12

Vector exponential

e^x operation

VEC_REDUCE_SUM13

Reduce sum

Sum all elements

VEC_VS_DIV14

Vector-scalar divide

Divide vector by scalar

VEC_VS_SUB15

Vector-scalar subtract

Subtract scalar from vector

VEC_SQRT16

Vector square root

Element-wise sqrt

VEC_GELU17

GELU activation

Gaussian Error Linear Unit

REDUCE

Performs reduction operations that aggregate vector elements into a single scalar value. Commonly used for computing sums, finding extrema, or other aggregate statistics across arrays.

31:26

25:21

20:16

15:11

10:6

5:0

010001

opcode

input addr

length

output addr

reserved

funct

operation

Syntax

REDUCE_OP rs, rt, rd

Operation

MEM[GRF[rd]] ← reduce(MEM[GRF[rs]:GRF[rt]], funct)

Reduction Functions

REDUCE_MAX000000

Reduce to maximum value

out = max(in[0..n-1])

REDUCE_SUM000001

Reduce to sum

out = Σ in[0..n-1]

Reduction operations produce a single scalar output regardless of input vector size, enabling efficient aggregation computations.

Examples

; Vector add (2 inputs)
G_LI r1, 0x1000           ; Vector a address
G_LI r2, 0x2000           ; Vector b address
G_LI r3, 0x3000           ; Output vector address
G_LI r4, 256              ; Vector length
VEC_ADD_1 r3, r1, r2, r4  ; input_num=2 (Z=1)

; GELU activation (unary, placeholder r0)
G_LI r5, 0x4000           ; Input vector
G_LI r6, 0x5000           ; Output vector
G_LI r7, 256              ; Length
VEC_GELU_0 r6, r5, r0, r7 ; input_num=1 (Z=0)

; Quantize to INT8 (unary)
G_LI r8, 0x6000           ; FP32 input
G_LI r9, 0x7000           ; INT8 output
G_LI r10, 512             ; Length
VEC_QUANTIZE_0 r9, r8, r0, r10 ; input_num=1 (Z=0)

; Softmax with max reduction
G_LI r11, 0x8000          ; Logits vector
G_LI r12, 256             ; Classes/length
G_LI r13, 0x8100          ; Temp scalar addr for max
REDUCE_MAX r13, r11, r12
G_LI r14, 0x8200          ; Output probabilities
VEC_SOFTMAX_0 r14, r11, r0, r12 ; input_num=1 (Z=0)

; Vector-scalar multiply
G_LI r15, 0x9000          ; Input vector
G_LI r16, 0xA000          ; Output vector
G_LI r17, 128             ; Length
G_LI r18, 0x3F800000      ; Scalar in GRF (1.0 as bits)
VEC_VSMUL_1 r16, r15, r18, r17 ; input_num=2 (Z=1)

; Reduce sum to scalar
G_LI r19, 0xB000          ; Input vector
G_LI r20, 512             ; Length
G_LI r21, 0xB800          ; Output scalar addr
REDUCE_SUM r21, r19, r20

; Element-wise max of two vectors
G_LI r22, 0xC000          ; Vector a
G_LI r23, 0xD000          ; Vector b
G_LI r24, 0xE000          ; Output vector
G_LI r25, 256             ; Length
VEC_VVMAX_1 r24, r22, r23, r25 ; input_num=2 (Z=1)