SimOps Concept Overview¶
The Problem¶
Modern robotics development faces a fundamental tension: training simulators prioritize speed and parallelism to produce policies at scale, while real-world deployment demands physically accurate validation before a policy ever touches hardware.
Today, most teams use a single simulator for both purposes — or skip validation entirely, relying on expensive physical prototypes. This creates:
- Sim-to-real gaps that only surface on hardware
- Slow iteration cycles when physical prototypes are the primary validation tool
- No systematic quality gate between policy training and deployment
The SimOps Framework¶
SimOps introduces a clear separation of concerns:
flowchart TB
subgraph Pipeline["SimOps Pipeline"]
direction LR
subgraph Training["Training Simulator"]
direction TB
T1["• Speed<br/>• Scale<br/>• Parallel"]
T2["MuJoCo /<br/>Isaac Lab"]
T1 --- T2
end
subgraph Validation["Validation Simulator"]
direction TB
V1["• Fidelity<br/>• Contact physics<br/>• Sensor models"]
V2["AGX Dynamics +<br/>Unreal Engine 5"]
V1 --- V2
end
subgraph Feedback["Feedback Loop"]
F1["Validation results feed<br/>back into training config"]
end
Training -->|Policy Transfer| Validation
Validation -->|Validation Results| Feedback
Feedback -->|Re-training| Training
end
Training Simulator — Policy Production¶
The training simulator is optimized for throughput. Its job is to produce candidate policies as fast as possible through massively parallel reinforcement learning.
| Priority | Approach |
|---|---|
| Speed | Simplified physics, GPU-accelerated environments |
| Scale | Thousands of parallel instances |
| Iteration | Rapid experimentation with reward shaping |
Typical engines: MuJoCo, Isaac Lab (Isaac Sim)
Validation Simulator — Policy Verification¶
The validation simulator is optimized for fidelity. It acts as the quality gate before any policy reaches physical hardware.
| Priority | Approach |
|---|---|
| Physical accuracy | High-fidelity contact, friction, deformation models |
| Sensor realism | Camera, LiDAR, IMU models matching real hardware |
| Scenario coverage | Edge cases, failure modes, adversarial conditions |
Typical engines: AGX Dynamics (co-simulation with Unreal Engine 5)
Closed-Loop Feedback¶
The key innovation is the closed loop: validation results automatically feed back into the training pipeline.
- Pass → Policy is promoted to hardware deployment candidate
- Fail → Failure analysis generates new training scenarios, reward adjustments, or domain randomization parameters
- Marginal → Targeted re-training on specific failure modes
This creates a self-improving pipeline where each validation cycle makes training more effective.
Why Not Just One Simulator?¶
Can't a single high-fidelity simulator do both?
In theory, yes. In practice, the requirements are contradictory:
- Training needs 1,000+ parallel instances running at 10,000× real-time → requires simplified physics
- Validation needs sub-millisecond timesteps with accurate contact dynamics → requires heavy computation per step
Forcing a single simulator to do both means either slow training or unreliable validation. SimOps eliminates this trade-off.
Analogy: DevOps → SimOps¶
Just as DevOps separated development and operations while automating the pipeline between them, SimOps separates training and validation while automating the feedback loop.
| DevOps | SimOps |
|---|---|
| Dev environment | Training simulator |
| Staging / QA | Validation simulator |
| CI/CD pipeline | Automated policy promotion |
| Test results → bug fixes | Validation failures → training adjustments |
| Infrastructure as Code | Simulation as Code |