CIMFlow LogoCIMFlow

Design Exploration

Batch processing for hardware parameter exploration

Estimated time: ~8 minutes

This tutorial demonstrates batch processing to explore different hardware configurations and understand architecture trade-offs.

Prerequisite: Complete the Step-by-Step tutorial first to understand individual compilation stages.


Batch Processing Overview

Instead of running one configuration at a time, CIMFlow supports batch processing through JSON configuration files:

  • Design Space Exploration — Systematically sweep hardware parameters
  • Reproducible Experiments — Version-controlled configuration files
  • Structured Outputs — Organized results for each configuration

Exploration Matrix

We'll explore a 2x2 parameter matrix to understand how NoC bandwidth and batch size affect performance:

Batch Size = 1Batch Size = 8
Bandwidth = 8Run 1: B8_batch1Run 2: B8_batch8
Bandwidth = 16Run 3: B16_batch1Run 4: B16_batch8

Configuration File

Create a batch configuration file:

exploration.json
{
  "_comment": "Design Space Exploration: Bandwidth x Batch Size",
  "output_dir": "output/exploration",
  "keep_ir": false,
  "run_name": "design_exploration",
  "runs": [
    {
      "_name": "B8_batch1",
      "_comment": "Low bandwidth, single batch",
      "model_path": "data/models/resnet18.onnx",
      "t": 8,
      "k": 16,
      "b": 8,
      "c": 64,
      "batch_size": 1,
      "strategy": "dp"
    },
    {
      "_name": "B8_batch8",
      "_comment": "Low bandwidth, large batch",
      "model_path": "data/models/resnet18.onnx",
      "t": 8,
      "k": 16,
      "b": 8,
      "c": 64,
      "batch_size": 8,
      "strategy": "dp"
    },
    {
      "_name": "B16_batch1",
      "_comment": "High bandwidth, single batch",
      "model_path": "data/models/resnet18.onnx",
      "t": 8,
      "k": 16,
      "b": 16,
      "c": 64,
      "batch_size": 1,
      "strategy": "dp"
    },
    {
      "_name": "B16_batch8",
      "_comment": "High bandwidth, large batch",
      "model_path": "data/models/resnet18.onnx",
      "t": 8,
      "k": 16,
      "b": 16,
      "c": 64,
      "batch_size": 8,
      "strategy": "dp"
    }
  ]
}

Configuration Fields

FieldDescription
output_dirBase directory for all run outputs
keep_irWhether to save intermediate representations
run_nameName prefix for this batch
runsArray of individual run configurations

Per-Run Fields

FieldDescription
model_pathPath to ONNX model (relative to config file)
t, k, b, cHardware parameters
batch_sizeInference batch size
strategyMapping strategy (dp for dynamic programming)

Running Batch Exploration

The Docker tutorial image includes tutorial/exploration.json. For local installation, save the configuration above to a file before running.

Execute all configurations with a single command:

Run batch exploration
cimflow run from-file tutorial/exploration.json

What to observe during batch processing

  • Compilation time: Larger batch sizes take longer (more operations to compile, more instructions generated)
  • Simulation time: Lower bandwidth increases communication overhead
  • Progress: Each configuration runs the full CG + OP + Simulation pipeline

Results Comparison

After completion, results are organized by configuration:

ls output/exploration/

Sample Results Table

BandwidthBatch SizeLatency (ms)TOPSTOPS/W
B=810.520.210.048
B=882.890.380.086
B=1610.470.240.054
B=1682.090.530.120

Actual values depend on the model and hardware configuration. The table above shows typical relative trends.


Extracting Metrics

Parse results programmatically:

Extract key metrics from all reports
for report in output/exploration/simulation_report_*.txt; do
    echo "=== $(basename $report) ==="
    grep -E "latency:|TOPS:|TOPS/W:" "$report"
done

Tip

Pipe the output to a file or use tools like jq for structured analysis of multiple runs.


Best Practices

Use descriptive names in _name field:

{
  "_name": "resnet18_T8_B16_batch8",
  "_comment": "High throughput configuration"
}

This helps identify results in the output directory.

Structure output directories by experiment:

{
  "output_dir": "output/exp_2024_01_bandwidth_sweep",
  "run_name": "bandwidth_sweep"
}

Keep related experiments together.

Version control your configuration files:

git add exploration.json
git commit -m "Add bandwidth exploration config"

Include the config file with your results for reproducibility.


Saving Intermediate Files

For debugging, enable keep_ir:

{
  "output_dir": "output/debug_exploration",
  "keep_ir": true,
  "runs": [...]
}

This saves CG instructions, ISA, and logs for each run.


Key Observations

  1. Bandwidth vs Latency: Higher bandwidth reduces communication latency, especially for communication-bound workloads.

  2. Batch Size vs Throughput: Larger batches improve TOPS but increase per-batch latency.

  3. Efficiency Trade-offs: Optimal TOPS/W depends on model architecture and hardware configuration.

  4. Design Space Size: Even small parameter sets create large design spaces—batch processing enables systematic exploration.


Tutorial Complete

Congratulations! You've completed the CIMFlow tutorial series. You now have the skills to compile neural network models to CIM hardware and evaluate different architectural configurations.

What You Learned

  • Docker Tutorial — Setup and verification
  • Quick Start — End-to-end pipeline execution
  • Step-by-Step — Individual compilation stages
  • Design Exploration — Batch processing for parameter sweeps

Continue Learning

Last updated on