Step-by-Step Compilation

Estimated time: ~5 minutes

This tutorial breaks down the CIMFlow pipeline into individual stages, allowing you to inspect intermediate representations at each step.

Prerequisite: Complete the Quick Start tutorial first to understand the end-to-end pipeline.

Pipeline Overview

The compilation flow consists of three main stages, each producing inspectable intermediate outputs:

ONNX

CG Compile

CG IR

OP Compile

ISA

Simulate

Report

Setup

Create output directories

mkdir -p output/stages/{cg,op,sim}

Set the model path

MODEL="data/models/resnet18.onnx"

Stage 1: CG-Level Compilation

CG (Compute Graph) compilation transforms the ONNX model into high-level CIM operations.

What CG Compilation Does

Parse ONNX Model

Load and validate neural network graph

Analyze Dependencies

Build layer dependency graph

Partition Computation

Partition into stages across cores

Generate Schedule

Create high-level operation ordering

Run CG Compilation

CG compilation command

cimflow compile cg \
    -m $MODEL \
    -o output/stages/cg \
    -l VERBOSE

The progress bar shows dynamic programming solving the optimal graph-to-stage mapping. Each "prefix" represents a subgraph being optimized.

Inspect CG Output

List the output files:

ls output/stages/cg/

View the CG instructions:

head -30 output/stages/cg/instructions_*.json

{
    "metadata": {
        "op_count": 1110836224
    },
    "core_0_0": {
        "stages": {
            "0": {
                "cluster_id": "/conv1.0/Conv",
                "weight_replica_id": 0,
                "instructions": [
                    {
                        "op": "read",
                        "attr": {
                            "tensor_type": "weight",
                            "shape": [64, 3, 3, 3],
                            "inst_group_id": "core_0.stage_0.read_weight.inst_0"
                        }
                    },
                    {
                        "op": "read",
                        "attr": {
                            "tensor_type": "feature",
                            "shape": [1, 3, 32, 32],
                            "inst_group_id": "core_0.stage_0.read_feature.inst_1"
                        }
                    },
                    {
                        "op": "add",
                        "attr": {
                            "shape": [1, 64, 32, 32],
                            "inst_group_id": "core_0.stage_0.add.inst_2"
                        }
                    },
                    {
                        "op": "conv",
                        "attr": {
                            "X_shape": [1, 3, 32, 32],
                            "W_shape": [64, 3, 3, 3],
                            "padding": [1, 1, 1, 1],
                            "strides": [1, 1],
                            "inst_group_id": "core_0.stage_0.conv.inst_3"
                        }
                    },
                    {
                        "op": "write",
                        "attr": {
                            "tensor_type": "feature",
                            "shape": [1, 64, 32, 32],
                            "inst_group_id": "core_0.stage_0.write_feature.inst_4",
                            "write_id": "node_0_batch_0"
                        }
                    }
                ]
            }
        }
    }
}

The CG instructions are organized by core and stage:

read: Load weights or feature data from memory
add: Vector addition for accumulation
conv: Execute convolution operation with specified shapes and padding
write: Write feature data back to memory
send: Transfer data to another core via NoC
receive: Receive data from another core via NoC

Each instruction includes an inst_group_id for tracing and debugging.

Count operations:

grep -c '"op":' output/stages/cg/instructions_*.json

Stage 2: OP-Level Compilation

OP (Operator) compilation transforms CG instructions into ISA instructions for each core.

What OP Compilation Does

Process CG Instructions

Parse high-level operation schedule

Generate ISA per Core

Create instruction sequences for each core

Memory Allocation

SRAM tiling and weight placement

Instruction Scheduling

Optimize execution order and synchronization

Run OP Compilation

OP compilation command

CG_FILE=$(ls output/stages/cg/instructions_*.json | head -1)

cimflow compile op \
    -i $CG_FILE \
    -o output/stages/op \
    -t 8 -b 16 \
    -l VERBOSE

Each core receives its own instruction sequence. The compiler handles SRAM tiling, weight loading, and synchronization between cores.

Inspect ISA Output

List the output:

ls output/stages/op/

View ISA instructions (JSON format):

head -40 output/stages/op/isa_instructions_*.json

{"metadata":{"op_count":1110836224},
"0":[
{"opcode":45,"rd":16,"imm":8},
{"opcode":45,"rd":17,"imm":8},
{"opcode":45,"rd":20,"imm":8},
{"opcode":44,"rd":0,"imm":1060352},
{"opcode":44,"rd":1,"imm":0},
{"opcode":44,"rd":2,"imm":1728},
{"opcode":48,"rs":0,"rt":2,"rd":1,"imm":0},
{"opcode":44,"rd":0,"imm":1060352},
{"opcode":44,"rd":1,"imm":524288},
{"opcode":44,"rd":2,"imm":3072},
{"opcode":48,"rs":0,"rt":2,"rd":1,"imm":0},
{"opcode":20,"rs":1,"rt":2,"rd":4,"re":3,"funct":0}
]}

The ISA instructions use numeric opcodes:

opcode 44/45: Load immediate values (G_LI, S_LI)
opcode 48/50: Memory copy operations (MEM_CPY)
opcode 20: Vector operations (VEC_ADD)
opcode 32/36: Scalar arithmetic

Instructions are grouped by core ID (e.g., "0" for core 0).

Count total instructions:

grep -c '"opcode":' output/stages/op/isa_instructions_*.json

Converting JSON to Assembly

The JSON format is machine-readable but verbose. Convert to assembly for easier inspection:

Convert JSON to assembly format

ISA_FILE=$(ls output/stages/op/isa_instructions_*.json | head -1)

cim-compiler convert \
    --src-type json \
    --dst-type asm \
    --src-file $ISA_FILE \
    --dst-file output/stages/op/isa_instructions.asm

When to use each format

JSON: Programmatic analysis, parsing with scripts
Assembly: Manual debugging, understanding instruction flow

View Assembly Output

head -30 output/stages/op/isa_instructions.asm

# Core 0
S_LI VEC_IBW1, 8
S_LI VEC_IBW2, 8
S_LI VEC_OBW, 8
G_LI r0, 1060352
G_LI r1, 0
G_LI r2, 1728
MEM_CPY r1, r0, r2, 0
G_LI r0, 1060352
G_LI r1, 524288
G_LI r2, 3072
MEM_CPY r1, r0, r2, 0
S_LI CIM_IBW, 8
S_LI CIM_WBW, 8
S_LI CIM_OBW, 32
S_LI CIM_GSZ, 8
S_LI CIM_AG, 16
VEC_ADD_1 r4, r1, r2, r3

The assembly format shows readable mnemonics:

S_LI/G_LI: Load immediate to special/general register
MEM_CPY: Memory copy between addresses
VEC_ADD_1: Vector addition operation
CIM_*: CIM configuration registers (bit widths, group sizes)

Stage 3: Simulation

Cycle-accurate simulation executes ISA instructions on the modeled CIM hardware.

What Simulation Does

Cycle-by-Cycle Execution

Precise timing for all instructions

CIM Array Modeling

Compute-in-memory operation timing

NoC Simulation

Inter-core communication delays

Energy Tracking

Power and energy estimation

Run Simulation

Simulation command

ISA_FILE=$(ls output/stages/op/isa_instructions_*.json | head -1)

cimflow sim \
    -i $ISA_FILE \
    -o output/stages/sim \
    -t 8 -b 16 \
    -l VERBOSE

The simulator models all 64 cores in parallel, tracking data movement through the NoC and CIM array operations. Power is estimated from a calibrated energy model.

View Simulation Results

cat output/stages/sim/simulation_*.txt

Sample output:

Simulation Result:
  - latency:            2.08837 ms
  - average power:      4420.4895 mW
  - total energy:       9231608896.0073 pJ/it
  - TOPS:               0.5319
  - TOPS/W:             0.1203

Output Summary

After completing all stages, your output directory contains:

instructions_*.json

isa_instructions_*.json

isa_instructions.asm

simulation_*.txt

Key Files

File	Description
`cg/instructions_*.json`	CG-level operations (READ, COMPUTE, WRITE)
`op/isa_instructions_*.json`	ISA instructions per core (JSON)
`op/isa_instructions.asm`	ISA instructions (human-readable assembly)
`sim/simulation_*.txt`	Performance report

The * in filenames represents a timestamp suffix (e.g., instructions_20240115_143052.json). This allows multiple runs without overwriting previous results.

Comparing Formats

{"opcode":44,"rd":0,"imm":1060352},
{"opcode":44,"rd":1,"imm":0},
{"opcode":44,"rd":2,"imm":1728},
{"opcode":48,"rs":0,"rt":2,"rd":1,"imm":0},
{"opcode":20,"rs":1,"rt":2,"rd":4,"re":3,"funct":0}

Advantages:

Machine-readable
Complete metadata with register encodings
Easy to parse programmatically

G_LI r0, 1060352
G_LI r1, 0
G_LI r2, 1728
MEM_CPY r1, r0, r2, 0
VEC_ADD_1 r4, r1, r2, r3

Advantages:

Human-readable mnemonics
Compact representation
Familiar format for debugging

What You Learned

CG Compilation — Transform ONNX to high-level CIM operations
OP Compilation — Generate per-core ISA instructions
IR Inspection — Inspect JSON and assembly outputs

Step-by-Step Compilation

On this page