CIMFlow LogoCIMFlow

Inter-Core Synchronization

Multi-core synchronization and barrier instructions

Inter-core synchronization instructions provide fine-grained coordination primitives for multi-core systems, enabling both pairwise synchronization and collective barrier operations across multiple cores.


WAIT

Performs pairwise synchronization between cores. The current core blocks until a specified synchronization point is reached, with flexible control over which core(s) to wait for and expected write count.

31:26
25:21
20:16
15:11
10:0
111101
opcode
rs
source id
rt
sync id
rd
expected writes
0
reserved
Syntax
WAIT rs, rt, rd
Operation
Block until sync point GRF[rt] on core GRF[rs] with GRF[rd] expected writes

WAIT works in conjunction with TAG for producer-consumer synchronization. When GRF[rs] is 0, WAIT blocks until global synchronization; when set to a specific core ID, it waits for that core's TAG.


BARRIER

Performs a collective synchronization across multiple cores using a barrier protocol. All participating cores must reach the barrier before any can proceed, ensuring global consistency in multi-core computations.

31:26
25:21
20:16
15:0
111110
opcode
rs
num cores
rt
barrier ID
0
reserved
Syntax
BARRIER rs, rt
Operation
Sync GRF[rs] cores at barrier GRF[rt]

All cores must execute BARRIER with identical num_cores and barrier_id values. Mismatched parameters will cause deadlock.


TAG

Marks a synchronization point on the current core, signaling completion of work to other cores waiting via WAIT instruction.

31:26
25:21
20:0
111111
opcode
rs
sync id
0
reserved
Syntax
TAG rs
Operation
Signal synchronization point with ID in GRF[rs]

TAG is used in conjunction with WAIT to implement producer-consumer synchronization patterns. A core signals completion of a phase by executing TAG, and other cores unblock by using WAIT with the same sync ID.


Examples

; Example 1: Point-to-point synchronization
; Core 0: Producer
G_LI  r1, 0x1000
SC_ST r10, 0(r1)       ; Write data to memory
G_LI  r5, 100          ; Sync ID
TAG   r5               ; Signal completion

; Core 1: Consumer
G_LI  r1, 0            ; Source (any core/global)
G_LI  r5, 100          ; Sync ID (must match TAG)
G_LI  r6, 1            ; Expected writes
WAIT  r1, r5, r6       ; Block until Core 0's TAG
G_LI  r2, 0x1000
SC_LD r11, 0(r2)       ; Read data (guaranteed visible)

; Example 2: Multi-core synchronization (4 cores)
; Each core executes this pattern
G_LI  r10, 200         ; Sync ID base

; Core 0, 1, 2, 3 all do some work
; ... (computation code)

; Core 0: Signal completion
G_LI  r5, 200
TAG   r5               ; Mark sync point

; Other cores: Wait for all
G_LI  r1, 0            ; Wait for any/all cores
G_LI  r5, 200          ; Sync ID
G_LI  r6, 4            ; Expect 4 cores
WAIT  r1, r5, r6       ; Block until all 4 TAG

; Continue with next phase
; ... (aggregation code)

; Example 3: Multi-phase pipeline
G_LI  r10, 5000        ; Base sync ID
G_LI  r11, 0           ; Iteration counter

; loop_start:
; ... work phase code ...

; Barrier: compute dynamic sync_id
SC_ADD r5, r10, r11    ; sync_id = 5000 + iteration
TAG   r5               ; Core signals completion
WAIT  r0, r5, r8       ; Wait for all (r8=num_cores)

; Update iteration counter
SC_ADDI r11, r11, 1    ; iteration++

; Check termination
G_LI  r12, 100
BLT   r11, r12, -10    ; if (iteration < 100) jump back

; Example 4: Subset synchronization
; Group A (Cores 0-3): Use sync ID 100
G_LI  r5, 100
TAG   r5
G_LI  r1, 0
G_LI  r6, 4
WAIT  r1, r5, r6       ; Only Group A synchronizes

; Group B (Cores 4-7): Use sync ID 200
G_LI  r5, 200
TAG   r5
G_LI  r1, 0
G_LI  r6, 4
WAIT  r1, r5, r6       ; Only Group B synchronizes

Last updated on