Teleoperation¶
Teleoperation allows a human operator to remotely control a robot in real-time. In the context of embodied AI, teleoperation serves as a critical tool for collecting expert demonstrations, validating robot capabilities, and enabling human-in-the-loop autonomy.
Why Teleoperation Matters for AI¶
Teleoperation is a bridge between human intelligence and robot learning:
- Data collection: Generate expert demonstrations for imitation learning
- Task validation: Verify that a task is physically feasible before training a policy
- Shared autonomy: Human handles hard parts, autonomy handles routine parts
- Safety: Human oversight during deployment of learned policies
Types of Teleoperation¶
By Input Device¶
| Device | Degrees of Freedom | Latency | Ease of Use | Use Case |
|---|---|---|---|---|
| Keyboard/Gamepad | Low (6-8) | Low | Easy | Mobile base, simple tasks |
| 3D SpaceMouse | 6 DOF | Low | Medium | Arm manipulation |
| VR Controllers | 6 DOF per hand + fingers | Medium | High | Bimanual manipulation |
| Exoskeleton | Full body | Low | Hard | Humanoid whole-body |
| Hand tracking | Per-finger control | Medium | Easy | Dexterous manipulation |
| Motion capture | Full body | Low | Hard (setup) | Locomotion + manipulation |
By Control Mode¶
Joint-space teleoperation: Operator commands map directly to robot joint angles. Simple but unintuitive for complex robots.
Task-space teleoperation: Operator commands map to end-effector position/orientation (Cartesian space). More intuitive but requires inverse kinematics.
Retargeting-based: Map human body motion to robot body motion, accounting for morphology differences.
Key Systems¶
ALOHA (A Low-cost Open-source Hardware System for Bimanual Teleoperation)¶
ALOHA (Zhao et al., 2023) is an influential low-cost teleoperation system:
- Hardware: Leader-follower arm pairs (operator moves leader, follower copies)
- Cost: ~$20K total (vs. $100K+ for industrial systems)
- Capabilities: Fine-grained bimanual manipulation
- Data quality: High-quality demonstrations for imitation learning (ACT policy)
Mobile ALOHA: Extends ALOHA with a mobile base for whole-room tasks.
GELLO¶
GELLO (Wu et al., 2024): A general, low-cost teleoperation device:
- 3D-printed, kinematically matched to the target robot arm
- Low-latency joint-space control
- Can be built for ~$200
UMI (Universal Manipulation Interface)¶
UMI (Chi et al., 2024): A hand-held gripper with tracking:
- Operator holds and moves a gripper directly
- Tracking via GoPro + SLAM provides 6-DOF pose
- No robot needed during demonstration β data collected in any environment
- Policy trained in simulation, deployed on various robot arms
VR-Based Systems¶
VR headsets (Meta Quest, Apple Vision Pro) provide:
- Stereoscopic camera view from robot head
- 6-DOF hand tracking per hand
- Finger tracking for dexterous manipulation
- Immersive experience for the operator
Exoskeleton-Based Systems¶
For humanoid robots:
- Full-body motion capture (suit or marker-based)
- Real-time retargeting from human skeleton to robot skeleton
- Handles morphology differences (different limb lengths, DOF)
Retargeting: Human to Robot¶
When human and robot have different morphologies, retargeting maps human motion to robot motion:
Position-Based Retargeting¶
Map human keypoint positions to robot end-effector positions:
Then use inverse kinematics (IK) to solve for robot joint angles.
Joint-Angle Retargeting¶
Map human joint angles directly to robot joint angles, with appropriate scaling:
where \(A\) and \(b\) account for joint range differences and kinematic mapping.
Optimization-Based Retargeting¶
Solve an optimization problem at each frame:
where \(\text{FK}_i\) is forward kinematics for keypoint \(i\), \(p_i^{\text{target}}\) is the desired position, and the second term ensures smoothness.
Data Quality for Learning¶
The quality of teleoperation data critically affects downstream policy learning:
Factors Affecting Data Quality¶
| Factor | Impact | Mitigation |
|---|---|---|
| Operator skill | Large β expert demonstrations are much more useful | Training, practice sessions |
| Control latency | Delays cause jerky, suboptimal motions | Low-latency hardware, predictive display |
| Workspace mismatch | Human and robot workspaces differ | Careful calibration, scaling |
| Recording artifacts | Noise, dropped frames, calibration errors | Post-processing, filtering |
Best Practices¶
- Consistent setup: Same camera angles, lighting, object placement
- Multiple operators: Diverse demonstration styles improve generalization
- Task decomposition: Collect demonstrations for subtasks when full tasks are too long
- Quality filtering: Review and discard failed or low-quality demonstrations
- Annotation: Record task success/failure labels, phase boundaries, language descriptions
Key References¶
- Zhao, T.Z., et al. (2023). "Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware." RSS.
- Wu, H., et al. (2024). "GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators." RSS.
- Chi, C., et al. (2024). "Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots." RSS.
- Cheng, X., et al. (2024). "Open-TeleVision: Teleoperation with Immersive Active Visual Feedback." arXiv.
- He, L., et al. (2024). "OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning." arXiv.