Curriculum Learning & Progressive Hardening¶
Beyond Pass/Fail¶
A naive validation pipeline simply asks: did the policy pass or fail? SimOps goes further. The closed-loop feedback generates structured training signals that automatically refine the policy through curriculum learning and progressive hardening.
flowchart LR
subgraph Naive["❌ Naive Approach"]
N1["Train"] --> N2["Validate"]
N2 -->|Pass| N3["Deploy"]
N2 -->|Fail| N4["Retrain<br/>from scratch"]
end
subgraph SimOps["✅ SimOps Approach"]
S1["Train"] --> S2["Validate"]
S2 -->|Pass| S3["Deploy"]
S2 -->|Fail| S4["Analyze failure"]
S4 --> S5["Auto-adjust<br/>curriculum"]
S5 --> S1
end
What Is Curriculum Learning?¶
Curriculum learning structures the training process from easy to hard, mirroring how humans learn. Instead of throwing the full complexity of a task at the agent from the start, the training environment gradually increases difficulty.
In SimOps, this curriculum is not manually designed — it is automatically generated from validation feedback.
flowchart TD
subgraph CL["Automated Curriculum Pipeline"]
V["🔬 Validation Results"] --> FA["📋 Failure Analysis"]
FA --> FC["🏷️ Failure Classification"]
FC --> CG["📐 Curriculum Generator"]
CG --> L1["Level 1: Nominal conditions"]
CG --> L2["Level 2: Mild perturbations"]
CG --> L3["Level 3: Edge cases"]
CG --> L4["Level 4: Adversarial scenarios"]
L1 --> TR["🏋️ Training Simulator"]
L2 --> TR
L3 --> TR
L4 --> TR
end
Curriculum Dimensions¶
The curriculum adjusts along multiple axes simultaneously:
| Dimension | Easy → Hard | Example |
|---|---|---|
| Physics perturbation | Nominal parameters | ±30% mass, friction, damping |
| Sensor noise | Clean signals | Realistic noise + dropout |
| Task complexity | Static targets | Moving, accelerating targets |
| Environmental variation | Single environment | Randomized lighting, obstacles |
| Timing pressure | Relaxed deadlines | Real-time constraints |
| Failure recovery | No disturbances | External pushes, sensor failure |
Progressive Hardening¶
Progressive hardening is the process by which a policy becomes increasingly robust through iterative validation-training cycles. Each cycle exposes new weaknesses, which are then addressed in the next training round.
flowchart TD
subgraph Cycle1["Cycle 1: Basic Competence"]
C1T["Train on nominal<br/>conditions"]
C1V["Validate"]
C1R["Result: 70% pass<br/>Fails on fast targets"]
C1T --> C1V --> C1R
end
subgraph Cycle2["Cycle 2: Speed Robustness"]
C2T["Train with expanded<br/>speed range"]
C2V["Validate"]
C2R["Result: 85% pass<br/>Fails on sensor noise"]
C2T --> C2V --> C2R
end
subgraph Cycle3["Cycle 3: Sensor Robustness"]
C3T["Train with realistic<br/>sensor artifacts"]
C3V["Validate"]
C3R["Result: 93% pass<br/>Fails on edge contacts"]
C3T --> C3V --> C3R
end
subgraph Cycle4["Cycle 4: Contact Robustness"]
C4T["Train with diverse<br/>contact scenarios"]
C4V["Validate"]
C4R["Result: 97% pass ✅<br/>Ready for hardware"]
C4T --> C4V --> C4R
end
Cycle1 --> Cycle2 --> Cycle3 --> Cycle4
The Hardening Gradient¶
Each hardening cycle operates at a specific feedback level:
graph BT
subgraph Levels["Feedback Levels"]
L1["Level 1: Reward Adjustment<br/>━━━━━━━━━━━━━━━━━━━<br/>Modify reward weights based on<br/>which failure modes dominate"]
L2["Level 2: Domain Randomization<br/>━━━━━━━━━━━━━━━━━━━<br/>Expand parameter ranges where<br/>validation found brittleness"]
L3["Level 3: Scenario Injection<br/>━━━━━━━━━━━━━━━━━━━<br/>Add specific failure scenarios<br/>as training environments"]
L4["Level 4: Architecture Adaptation<br/>━━━━━━━━━━━━━━━━━━━<br/>Suggest network architecture changes<br/>when policy capacity is insufficient"]
end
L1 --> L2 --> L3 --> L4
| Level | Trigger | Action | Impact |
|---|---|---|---|
| 1. Reward Adjustment | Consistent failure pattern | Re-weight reward components | Low — fastest iteration |
| 2. Domain Randomization | Brittle at parameter boundaries | Expand randomization ranges | Medium — broader coverage |
| 3. Scenario Injection | Specific failure modes | Add targeted training scenarios | High — addresses root causes |
| 4. Architecture Adaptation | Policy capacity exhausted | Modify network size/structure | Highest — fundamental change |
Automated Feedback Loop Detail¶
From Validation Failure to Training Signal¶
sequenceDiagram
participant VS as 🔬 Validation Sim
participant FA as 📋 Failure Analyzer
participant CG as 📐 Curriculum Generator
participant TS as 🏋️ Training Sim
VS->>FA: Validation results + telemetry
FA->>FA: Classify failure modes
FA->>FA: Compute severity scores
FA->>CG: Failure report + root causes
CG->>CG: Determine feedback level
CG->>CG: Generate curriculum update
alt Level 1: Reward
CG->>TS: Updated reward weights
else Level 2: Randomization
CG->>TS: Expanded DR parameters
else Level 3: Scenario
CG->>TS: New training scenarios
else Level 4: Architecture
CG->>TS: Architecture recommendations
end
TS->>TS: Resume training with updates
TS->>VS: New policy candidate
Failure Classification Taxonomy¶
The failure analyzer categorizes each failure into a structured taxonomy:
| Category | Subcategories | Typical Feedback Level |
|---|---|---|
| Kinematic failure | Joint limits, workspace boundary, singularity | Level 1–2 |
| Dynamic failure | Balance loss, excessive force, velocity limit | Level 2–3 |
| Contact failure | Missed contact, slip, unintended collision | Level 2–3 |
| Perception failure | Object misdetection, depth error, latency | Level 2–3 |
| Planning failure | Suboptimal trajectory, timing error | Level 1–3 |
| Generalization failure | Works in training distribution, fails outside | Level 3–4 |
Convergence Monitoring¶
SimOps tracks the hardening progress to detect convergence and identify diminishing returns:
graph LR
subgraph Monitoring["Convergence Dashboard"]
direction TB
M1["📈 Pass rate trend across cycles"]
M2["📊 Failure mode distribution shift"]
M3["🎯 Gap metrics convergence"]
M4["⏱️ Training efficiency per cycle"]
M5["🔄 New failure discovery rate"]
end
Convergence Criteria¶
A policy is considered hardened when:
- Pass rate exceeds target threshold (e.g., > 95%) across all scenario categories
- No new failure modes discovered in the last N validation runs
- Gap metrics are within bounded tolerances (see Sim-to-Real Gap Quantification)
- Marginal improvement per cycle drops below efficiency threshold
Key Insight
Progressive hardening transforms validation from a binary gate into a continuous improvement engine. Each failure makes the next policy stronger, and the system knows when it has converged.
Comparison: Manual vs SimOps Hardening¶
| Aspect | Manual Approach | SimOps Automated |
|---|---|---|
| Failure analysis | Engineer inspects logs | Automated classification |
| Curriculum design | Hand-crafted difficulty levels | Auto-generated from failures |
| Feedback speed | Days (human in the loop) | Minutes (automated pipeline) |
| Coverage | Limited by engineer intuition | Systematic + exhaustive |
| Reproducibility | Low — depends on individual | High — deterministic pipeline |
| Convergence detection | Subjective judgment | Quantitative criteria |