Design Exploration

Estimated time: ~8 minutes

This tutorial demonstrates batch processing to explore different hardware configurations and understand architecture trade-offs.

Prerequisite: Complete the Step-by-Step tutorial first to understand individual compilation stages.

Batch Processing Overview

Instead of running one configuration at a time, CIMFlow supports batch processing through JSON configuration files:

Design Space Exploration — Systematically sweep hardware parameters
Reproducible Experiments — Version-controlled configuration files
Structured Outputs — Organized results for each configuration

Exploration Matrix

We'll explore a 2x2 parameter matrix to understand how NoC bandwidth and batch size affect performance:

	Batch Size = 1	Batch Size = 8
Bandwidth = 8	Run 1: B8_batch1	Run 2: B8_batch8
Bandwidth = 16	Run 3: B16_batch1	Run 4: B16_batch8

Configuration File

Create a batch configuration file:

exploration.json

{
  "_comment": "Design Space Exploration: Bandwidth x Batch Size",
  "output_dir": "output/exploration",
  "keep_ir": false,
  "run_name": "design_exploration",
  "runs": [
    {
      "_name": "B8_batch1",
      "_comment": "Low bandwidth, single batch",
      "model_path": "data/models/resnet18.onnx",
      "t": 8,
      "k": 16,
      "b": 8,
      "c": 64,
      "batch_size": 1,
      "strategy": "dp"
    },
    {
      "_name": "B8_batch8",
      "_comment": "Low bandwidth, large batch",
      "model_path": "data/models/resnet18.onnx",
      "t": 8,
      "k": 16,
      "b": 8,
      "c": 64,
      "batch_size": 8,
      "strategy": "dp"
    },
    {
      "_name": "B16_batch1",
      "_comment": "High bandwidth, single batch",
      "model_path": "data/models/resnet18.onnx",
      "t": 8,
      "k": 16,
      "b": 16,
      "c": 64,
      "batch_size": 1,
      "strategy": "dp"
    },
    {
      "_name": "B16_batch8",
      "_comment": "High bandwidth, large batch",
      "model_path": "data/models/resnet18.onnx",
      "t": 8,
      "k": 16,
      "b": 16,
      "c": 64,
      "batch_size": 8,
      "strategy": "dp"
    }
  ]
}

Configuration Fields

Field	Description
`output_dir`	Base directory for all run outputs
`keep_ir`	Whether to save intermediate representations
`run_name`	Name prefix for this batch
`runs`	Array of individual run configurations

Per-Run Fields

Field	Description
`model_path`	Path to ONNX model (relative to config file)
`t`, `k`, `b`, `c`	Hardware parameters
`batch_size`	Inference batch size
`strategy`	Mapping strategy (`dp` for dynamic programming)

Running Batch Exploration

The Docker tutorial image includes tutorial/exploration.json. For local installation, save the configuration above to a file before running.

Execute all configurations with a single command:

Run batch exploration

cimflow run from-file tutorial/exploration.json

What to observe during batch processing

Compilation time: Larger batch sizes take longer (more operations to compile, more instructions generated)
Simulation time: Lower bandwidth increases communication overhead
Progress: Each configuration runs the full CG + OP + Simulation pipeline

Results Comparison

After completion, results are organized by configuration:

ls output/exploration/

Sample Results Table

Bandwidth	Batch Size	Latency (ms)	TOPS	TOPS/W
B=8	1	0.52	0.21	0.048
B=8	8	2.89	0.38	0.086
B=16	1	0.47	0.24	0.054
B=16	8	2.09	0.53	0.120

Actual values depend on the model and hardware configuration. The table above shows typical relative trends.

Extracting Metrics

Parse results programmatically:

Extract key metrics from all reports

for report in output/exploration/simulation_report_*.txt; do
    echo "=== $(basename $report) ==="
    grep -E "latency:|TOPS:|TOPS/W:" "$report"
done

Tip

Pipe the output to a file or use tools like jq for structured analysis of multiple runs.

Best Practices

Use descriptive names in _name field:

{
  "_name": "resnet18_T8_B16_batch8",
  "_comment": "High throughput configuration"
}

This helps identify results in the output directory.

Structure output directories by experiment:

{
  "output_dir": "output/exp_2024_01_bandwidth_sweep",
  "run_name": "bandwidth_sweep"
}

Keep related experiments together.

Version control your configuration files:

git add exploration.json
git commit -m "Add bandwidth exploration config"

Include the config file with your results for reproducibility.

Saving Intermediate Files

For debugging, enable keep_ir:

{
  "output_dir": "output/debug_exploration",
  "keep_ir": true,
  "runs": [...]
}

This saves CG instructions, ISA, and logs for each run.

Key Observations

Bandwidth vs Latency: Higher bandwidth reduces communication latency, especially for communication-bound workloads.
Batch Size vs Throughput: Larger batches improve TOPS but increase per-batch latency.
Efficiency Trade-offs: Optimal TOPS/W depends on model architecture and hardware configuration.
Design Space Size: Even small parameter sets create large design spaces—batch processing enables systematic exploration.

Tutorial Complete

Congratulations! You've completed the CIMFlow tutorial series. You now have the skills to compile neural network models to CIM hardware and evaluate different architectural configurations.

What You Learned

Docker Tutorial — Setup and verification
Quick Start — End-to-end pipeline execution
Step-by-Step — Individual compilation stages
Design Exploration — Batch processing for parameter sweeps