Inter-Core Synchronization
Multi-core synchronization and barrier instructions
Inter-core synchronization instructions provide fine-grained coordination primitives for multi-core systems, enabling both pairwise synchronization and collective barrier operations across multiple cores.
WAIT
Performs pairwise synchronization between cores. The current core blocks until a specified synchronization point is reached, with flexible control over which core(s) to wait for and expected write count.
WAIT rs, rt, rdBlock until sync point GRF[rt] on core GRF[rs] with GRF[rd] expected writesWAIT works in conjunction with TAG for producer-consumer synchronization. When GRF[rs] is 0, WAIT blocks until global synchronization; when set to a specific core ID, it waits for that core's TAG.
BARRIER
Performs a collective synchronization across multiple cores using a barrier protocol. All participating cores must reach the barrier before any can proceed, ensuring global consistency in multi-core computations.
BARRIER rs, rtSync GRF[rs] cores at barrier GRF[rt]All cores must execute BARRIER with identical num_cores and barrier_id values. Mismatched parameters will cause deadlock.
TAG
Marks a synchronization point on the current core, signaling completion of work to other cores waiting via WAIT instruction.
TAG rsSignal synchronization point with ID in GRF[rs]TAG is used in conjunction with WAIT to implement producer-consumer synchronization patterns. A core signals completion of a phase by executing TAG, and other cores unblock by using WAIT with the same sync ID.
Examples
; Example 1: Point-to-point synchronization
; Core 0: Producer
G_LI r1, 0x1000
SC_ST r10, 0(r1) ; Write data to memory
G_LI r5, 100 ; Sync ID
TAG r5 ; Signal completion
; Core 1: Consumer
G_LI r1, 0 ; Source (any core/global)
G_LI r5, 100 ; Sync ID (must match TAG)
G_LI r6, 1 ; Expected writes
WAIT r1, r5, r6 ; Block until Core 0's TAG
G_LI r2, 0x1000
SC_LD r11, 0(r2) ; Read data (guaranteed visible)
; Example 2: Multi-core synchronization (4 cores)
; Each core executes this pattern
G_LI r10, 200 ; Sync ID base
; Core 0, 1, 2, 3 all do some work
; ... (computation code)
; Core 0: Signal completion
G_LI r5, 200
TAG r5 ; Mark sync point
; Other cores: Wait for all
G_LI r1, 0 ; Wait for any/all cores
G_LI r5, 200 ; Sync ID
G_LI r6, 4 ; Expect 4 cores
WAIT r1, r5, r6 ; Block until all 4 TAG
; Continue with next phase
; ... (aggregation code)
; Example 3: Multi-phase pipeline
G_LI r10, 5000 ; Base sync ID
G_LI r11, 0 ; Iteration counter
; loop_start:
; ... work phase code ...
; Barrier: compute dynamic sync_id
SC_ADD r5, r10, r11 ; sync_id = 5000 + iteration
TAG r5 ; Core signals completion
WAIT r0, r5, r8 ; Wait for all (r8=num_cores)
; Update iteration counter
SC_ADDI r11, r11, 1 ; iteration++
; Check termination
G_LI r12, 100
BLT r11, r12, -10 ; if (iteration < 100) jump back
; Example 4: Subset synchronization
; Group A (Cores 0-3): Use sync ID 100
G_LI r5, 100
TAG r5
G_LI r1, 0
G_LI r6, 4
WAIT r1, r5, r6 ; Only Group A synchronizes
; Group B (Cores 4-7): Use sync ID 200
G_LI r5, 200
TAG r5
G_LI r1, 0
G_LI r6, 4
WAIT r1, r5, r6 ; Only Group B synchronizesLast updated on