CIMFlow LogoCIMFlow

Quick Start

Run an end-to-end pipeline with one command

Estimated time: ~3 minutes

This tutorial demonstrates running ResNet-18 through the complete CIMFlow pipeline with a single command.

Prerequisite: Complete the Docker Tutorial first, or ensure CIMFlow is installed locally.


The Pipeline Command

CIMFlow compiles ONNX models to CIM hardware instructions and simulates execution. The run pipeline command handles the complete flow:

ONNX Model
CG Compile
OP Compile
Simulate
Report

Running the Pipeline

Configure Parameters

For this example, we use the following configuration:

ParameterValueDescription
ModelResNet-18Standard CNN architecture
T8Macro group size
K16Macro group number
B16NoC bandwidth (flits)
C64Core count
Batch Size8Inference batch

Execute the Pipeline

cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/quickstart \
    -t 8 -k 16 -b 16 -c 64 \
    --batch-size 8

What happens during execution:

  • CG stage: Progress bar shows graph partitioning into execution stages
  • OP stage: Progress bar shows per-core instruction generation
  • Simulation: Cycle-accurate execution across all cores

View Results

After completion, check the output:

ls output/quickstart/

Understanding the Results

The simulation produces a report with key performance metrics:

simulation_report_*.txt
Simulation Result:
  - latency:            2.08837 ms
  - average power:      4420.4895 mW
  - total energy:       9231608896.0073 pJ/it
  - TOPS:               0.5319
  - TOPS/W:             0.1203

Metric Definitions

MetricDescription
LatencyEnd-to-end inference time for one batch
Average PowerPower consumption during inference
Total EnergyEnergy consumed per inference iteration
TOPSTera operations per second (throughput)
TOPS/WEnergy efficiency (throughput per watt)

What Just Happened

The pipeline executed three stages:

1. CG Compilation
  • Parsed the ONNX model
  • Created compute graph representation
  • Partitioned computation for CIM architecture
2. OP Compilation
  • Generated ISA instructions for each core
  • Handled memory allocation and data movement
  • Optimized instruction scheduling
3. Simulation
  • Cycle-accurate simulation of CIM hardware
  • Modeled all 64 cores in parallel
  • Tracked power and energy consumption

Adjusting Parameters

Experiment with different configurations:

Reduce NoC bandwidth to 8 flits
cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/test_b8 \
    -t 8 -k 16 -b 8 -c 64 \
    --batch-size 8

Lower bandwidth increases communication latency but reduces hardware cost.

Larger batch for higher throughput
cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/test_batch16 \
    -t 8 -k 16 -b 16 -c 64 \
    --batch-size 16

Larger batches improve throughput by better utilizing the multi-core parallelism.


Trying Different Models

Run with MobileNetV2:

cimflow run pipeline \
    -m data/models/mobilenetv2.onnx \
    -o output/mobilenet \
    -t 8 -k 16 -b 16 -c 64 \
    --batch-size 8

Keeping Intermediate Files

Use --keep-ir to preserve all intermediate representations:

Keep all intermediate files
cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/debug \
    -t 8 -b 16 \
    --keep-ir

What gets saved

  • CG instructions (JSON) - High-level compute graph operations
  • ISA instructions (JSON and assembly) - Per-core instruction sequences
  • Detailed logs - Compilation and simulation traces

Verbose Output

Enable detailed logging to see compilation progress:

Enable verbose logging
cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/verbose \
    -t 8 -b 16 \
    -l VERBOSE

Log levels (from most to least verbose): TRACE | DEBUG | VERBOSE | INFO | WARNING | ERROR


What You Learned

  • Pipeline Execution — Run end-to-end compilation and simulation with one command
  • Results Interpretation — Understand latency, power, TOPS, and efficiency metrics

Last updated on