CIMFlow LogoCIMFlow

Step-by-Step Compilation

Run each compilation stage individually with IR inspection

Estimated time: ~5 minutes

This tutorial breaks down the CIMFlow pipeline into individual stages, allowing you to inspect intermediate representations at each step.

Prerequisite: Complete the Quick Start tutorial first to understand the end-to-end pipeline.


Pipeline Overview

The compilation flow consists of three main stages, each producing inspectable intermediate outputs:

ONNX
CG Compile
CG IR
OP Compile
ISA
Simulate
Report

Setup

Create output directories

mkdir -p output/stages/{cg,op,sim}

Set the model path

MODEL="data/models/resnet18.onnx"

Stage 1: CG-Level Compilation

CG (Compute Graph) compilation transforms the ONNX model into high-level CIM operations.

What CG Compilation Does

Parse ONNX Model
Load and validate neural network graph
Analyze Dependencies
Build layer dependency graph
Partition Computation
Partition into stages across cores
Generate Schedule
Create high-level operation ordering

Run CG Compilation

CG compilation command
cimflow compile cg \
    -m $MODEL \
    -o output/stages/cg \
    -l VERBOSE

The progress bar shows dynamic programming solving the optimal graph-to-stage mapping. Each "prefix" represents a subgraph being optimized.

Inspect CG Output

List the output files:

ls output/stages/cg/

View the CG instructions:

head -30 output/stages/cg/instructions_*.json
{
    "metadata": {
        "op_count": 1110836224
    },
    "core_0_0": {
        "stages": {
            "0": {
                "cluster_id": "/conv1.0/Conv",
                "weight_replica_id": 0,
                "instructions": [
                    {
                        "op": "read",
                        "attr": {
                            "tensor_type": "weight",
                            "shape": [64, 3, 3, 3],
                            "inst_group_id": "core_0.stage_0.read_weight.inst_0"
                        }
                    },
                    {
                        "op": "read",
                        "attr": {
                            "tensor_type": "feature",
                            "shape": [1, 3, 32, 32],
                            "inst_group_id": "core_0.stage_0.read_feature.inst_1"
                        }
                    },
                    {
                        "op": "add",
                        "attr": {
                            "shape": [1, 64, 32, 32],
                            "inst_group_id": "core_0.stage_0.add.inst_2"
                        }
                    },
                    {
                        "op": "conv",
                        "attr": {
                            "X_shape": [1, 3, 32, 32],
                            "W_shape": [64, 3, 3, 3],
                            "padding": [1, 1, 1, 1],
                            "strides": [1, 1],
                            "inst_group_id": "core_0.stage_0.conv.inst_3"
                        }
                    },
                    {
                        "op": "write",
                        "attr": {
                            "tensor_type": "feature",
                            "shape": [1, 64, 32, 32],
                            "inst_group_id": "core_0.stage_0.write_feature.inst_4",
                            "write_id": "node_0_batch_0"
                        }
                    }
                ]
            }
        }
    }
}

The CG instructions are organized by core and stage:

  • read: Load weights or feature data from memory
  • add: Vector addition for accumulation
  • conv: Execute convolution operation with specified shapes and padding
  • write: Write feature data back to memory
  • send: Transfer data to another core via NoC
  • receive: Receive data from another core via NoC

Each instruction includes an inst_group_id for tracing and debugging.

Count operations:

grep -c '"op":' output/stages/cg/instructions_*.json

Stage 2: OP-Level Compilation

OP (Operator) compilation transforms CG instructions into ISA instructions for each core.

What OP Compilation Does

Process CG Instructions
Parse high-level operation schedule
Generate ISA per Core
Create instruction sequences for each core
Memory Allocation
SRAM tiling and weight placement
Instruction Scheduling
Optimize execution order and synchronization

Run OP Compilation

OP compilation command
CG_FILE=$(ls output/stages/cg/instructions_*.json | head -1)

cimflow compile op \
    -i $CG_FILE \
    -o output/stages/op \
    -t 8 -b 16 \
    -l VERBOSE

Each core receives its own instruction sequence. The compiler handles SRAM tiling, weight loading, and synchronization between cores.

Inspect ISA Output

List the output:

ls output/stages/op/

View ISA instructions (JSON format):

head -40 output/stages/op/isa_instructions_*.json
{"metadata":{"op_count":1110836224},
"0":[
{"opcode":45,"rd":16,"imm":8},
{"opcode":45,"rd":17,"imm":8},
{"opcode":45,"rd":20,"imm":8},
{"opcode":44,"rd":0,"imm":1060352},
{"opcode":44,"rd":1,"imm":0},
{"opcode":44,"rd":2,"imm":1728},
{"opcode":48,"rs":0,"rt":2,"rd":1,"imm":0},
{"opcode":44,"rd":0,"imm":1060352},
{"opcode":44,"rd":1,"imm":524288},
{"opcode":44,"rd":2,"imm":3072},
{"opcode":48,"rs":0,"rt":2,"rd":1,"imm":0},
{"opcode":20,"rs":1,"rt":2,"rd":4,"re":3,"funct":0}
]}

The ISA instructions use numeric opcodes:

  • opcode 44/45: Load immediate values (G_LI, S_LI)
  • opcode 48/50: Memory copy operations (MEM_CPY)
  • opcode 20: Vector operations (VEC_ADD)
  • opcode 32/36: Scalar arithmetic

Instructions are grouped by core ID (e.g., "0" for core 0).

Count total instructions:

grep -c '"opcode":' output/stages/op/isa_instructions_*.json

Converting JSON to Assembly

The JSON format is machine-readable but verbose. Convert to assembly for easier inspection:

Convert JSON to assembly format
ISA_FILE=$(ls output/stages/op/isa_instructions_*.json | head -1)

cim-compiler convert \
    --src-type json \
    --dst-type asm \
    --src-file $ISA_FILE \
    --dst-file output/stages/op/isa_instructions.asm

When to use each format

  • JSON: Programmatic analysis, parsing with scripts
  • Assembly: Manual debugging, understanding instruction flow

View Assembly Output

head -30 output/stages/op/isa_instructions.asm
# Core 0
S_LI VEC_IBW1, 8
S_LI VEC_IBW2, 8
S_LI VEC_OBW, 8
G_LI r0, 1060352
G_LI r1, 0
G_LI r2, 1728
MEM_CPY r1, r0, r2, 0
G_LI r0, 1060352
G_LI r1, 524288
G_LI r2, 3072
MEM_CPY r1, r0, r2, 0
S_LI CIM_IBW, 8
S_LI CIM_WBW, 8
S_LI CIM_OBW, 32
S_LI CIM_GSZ, 8
S_LI CIM_AG, 16
VEC_ADD_1 r4, r1, r2, r3

The assembly format shows readable mnemonics:

  • S_LI/G_LI: Load immediate to special/general register
  • MEM_CPY: Memory copy between addresses
  • VEC_ADD_1: Vector addition operation
  • CIM_*: CIM configuration registers (bit widths, group sizes)

Register names like VEC_IBW1 indicate vector input bit width.


Stage 3: Simulation

Cycle-accurate simulation executes ISA instructions on the modeled CIM hardware.

What Simulation Does

Cycle-by-Cycle Execution
Precise timing for all instructions
CIM Array Modeling
Compute-in-memory operation timing
NoC Simulation
Inter-core communication delays
Energy Tracking
Power and energy estimation

Run Simulation

Simulation command
ISA_FILE=$(ls output/stages/op/isa_instructions_*.json | head -1)

cimflow sim \
    -i $ISA_FILE \
    -o output/stages/sim \
    -t 8 -b 16 \
    -l VERBOSE

The simulator models all 64 cores in parallel, tracking data movement through the NoC and CIM array operations. Power is estimated from a calibrated energy model.

View Simulation Results

cat output/stages/sim/simulation_*.txt

Sample output:

Simulation Result:
  - latency:            2.08837 ms
  - average power:      4420.4895 mW
  - total energy:       9231608896.0073 pJ/it
  - TOPS:               0.5319
  - TOPS/W:             0.1203

Output Summary

After completing all stages, your output directory contains:

instructions_*.json
isa_instructions_*.json
isa_instructions.asm
simulation_*.txt

Key Files

FileDescription
cg/instructions_*.jsonCG-level operations (READ, COMPUTE, WRITE)
op/isa_instructions_*.jsonISA instructions per core (JSON)
op/isa_instructions.asmISA instructions (human-readable assembly)
sim/simulation_*.txtPerformance report

The * in filenames represents a timestamp suffix (e.g., instructions_20240115_143052.json). This allows multiple runs without overwriting previous results.


Comparing Formats

{"opcode":44,"rd":0,"imm":1060352},
{"opcode":44,"rd":1,"imm":0},
{"opcode":44,"rd":2,"imm":1728},
{"opcode":48,"rs":0,"rt":2,"rd":1,"imm":0},
{"opcode":20,"rs":1,"rt":2,"rd":4,"re":3,"funct":0}

Advantages:

  • Machine-readable
  • Complete metadata with register encodings
  • Easy to parse programmatically
G_LI r0, 1060352
G_LI r1, 0
G_LI r2, 1728
MEM_CPY r1, r0, r2, 0
VEC_ADD_1 r4, r1, r2, r3

Advantages:

  • Human-readable mnemonics
  • Compact representation
  • Familiar format for debugging

What You Learned

  • CG Compilation — Transform ONNX to high-level CIM operations
  • OP Compilation — Generate per-core ISA instructions
  • IR Inspection — Inspect JSON and assembly outputs

Last updated on