Skip to content

SimOps Concept Overview

The Problem

Modern robotics development faces a fundamental tension: training simulators prioritize speed and parallelism to produce policies at scale, while real-world deployment demands physically accurate validation before a policy ever touches hardware.

Today, most teams use a single simulator for both purposes — or skip validation entirely, relying on expensive physical prototypes. This creates:

  • Sim-to-real gaps that only surface on hardware
  • Slow iteration cycles when physical prototypes are the primary validation tool
  • No systematic quality gate between policy training and deployment

The SimOps Framework

SimOps introduces a clear separation of concerns:

flowchart TB
    subgraph Pipeline["SimOps Pipeline"]
        direction LR

        subgraph Training["Training Simulator"]
            direction TB
            T1["• Speed<br/>• Scale<br/>• Parallel"]
            T2["MuJoCo /<br/>Isaac Lab"]
            T1 --- T2
        end

        subgraph Validation["Validation Simulator"]
            direction TB
            V1["• Fidelity<br/>• Contact physics<br/>• Sensor models"]
            V2["AGX Dynamics +<br/>Unreal Engine 5"]
            V1 --- V2
        end

        subgraph Feedback["Feedback Loop"]
            F1["Validation results feed<br/>back into training config"]
        end

        Training -->|Policy Transfer| Validation
        Validation -->|Validation Results| Feedback
        Feedback -->|Re-training| Training
    end

Training Simulator — Policy Production

The training simulator is optimized for throughput. Its job is to produce candidate policies as fast as possible through massively parallel reinforcement learning.

Priority Approach
Speed Simplified physics, GPU-accelerated environments
Scale Thousands of parallel instances
Iteration Rapid experimentation with reward shaping

Typical engines: MuJoCo, Isaac Lab (Isaac Sim)

Validation Simulator — Policy Verification

The validation simulator is optimized for fidelity. It acts as the quality gate before any policy reaches physical hardware.

Priority Approach
Physical accuracy High-fidelity contact, friction, deformation models
Sensor realism Camera, LiDAR, IMU models matching real hardware
Scenario coverage Edge cases, failure modes, adversarial conditions

Typical engines: AGX Dynamics (co-simulation with Unreal Engine 5)

Closed-Loop Feedback

The key innovation is the closed loop: validation results automatically feed back into the training pipeline.

  • Pass → Policy is promoted to hardware deployment candidate
  • Fail → Failure analysis generates new training scenarios, reward adjustments, or domain randomization parameters
  • Marginal → Targeted re-training on specific failure modes

This creates a self-improving pipeline where each validation cycle makes training more effective.

Why Not Just One Simulator?

Can't a single high-fidelity simulator do both?

In theory, yes. In practice, the requirements are contradictory:

  • Training needs 1,000+ parallel instances running at 10,000× real-time → requires simplified physics
  • Validation needs sub-millisecond timesteps with accurate contact dynamics → requires heavy computation per step

Forcing a single simulator to do both means either slow training or unreliable validation. SimOps eliminates this trade-off.

Analogy: DevOps → SimOps

Just as DevOps separated development and operations while automating the pipeline between them, SimOps separates training and validation while automating the feedback loop.

DevOps SimOps
Dev environment Training simulator
Staging / QA Validation simulator
CI/CD pipeline Automated policy promotion
Test results → bug fixes Validation failures → training adjustments
Infrastructure as Code Simulation as Code