May 2026

SABER.

A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation — the first high-fidelity retail robotics action dataset built from natural human behavior, not teleoperation.

The Core Claim

Domain-specific robot deployment is fundamentally a data problem. High-fidelity naturalistic human behavior — systematically captured and retargeted — is a scalable foundation for robot adaptation. No robot in the loop required.

44.8K
Training Samples
100+
Hours Captured
2.19×
Improvement
Resources

Watch Videos

In-store capture demos

arXiv

Research Paper

Download PDF

Paper (local copy)

Dataset

SABER-10K on Hugging Face

Benchmark Results

RoboBenchMart evaluation

Need the full 44.8K corpus or custom capture?

Contact Sales
Why Retail Demands Its Own Data
Modern VLAs like GR00T N1.6 achieve near-zero success on retail tasks out of the box — not because the model is weak, but because the retail domain is entirely absent from training data.

Distinct Skill Distribution

Articulated object interaction, multi-height shelf reaching, basket loading, floor retrieval, and context-dependent placement — all repeated across hundreds of SKUs in layouts no lab can replicate.

Long-Tail Scene Variation

Dense shelves, active restocking, occlusions, varied lighting, reflective packaging, and product deformability create real-world complexity that generic datasets cannot approximate.

Repetition Matters

A model must see skill families repeatedly across contexts — grasping bottles from different shelf heights, opening fridges from varied approach angles — to achieve reliable deployment.

Key Results at a Glance
2.19×
Improvement over fine-tuning baselines on RoboBenchMart
29.3%
Mean success rate across all 10 retail manipulation tasks
91%
Average fridge task success — up from 43% baseline
100%
Non-robot data — entire dataset captured from human video alone
44.8K
Total Samples
100+
Capture Hours
3
Action Streams
10
Eval Tasks
Three Complementary Action Streams
From the same dual-camera in-store captures, three distinct supervision signals are derived — each encoding a different level of kinematic abstraction.
Stream 1

LAPA Latent Actions

25K

Embodiment-agnostic motion tokens derived via inverse-dynamics encoding from egocentric video. Captures whole-arm motion, reach trajectories, and grasping dynamics without robot joint labels.

Egocentric GoPro
Stream 2

Dexterous Hand Retargets

18.6K

21-point hand landmarks estimated, human-corrected frame-by-frame, then retargeted to robot joint space via Dex-Retargeting. Provides explicit finger-level precision supervision.

Egocentric GoPro
Stream 3

Whole-Body Retargets

1.2K

SMPL body parameters estimated from the 360° ALIA view, human-corrected, and retargeted to the Unitree G1 humanoid. Provides torso-arm-leg coordination for floor retrieval and extended reach.

Exocentric ALIA 360°
From Store Footage to Robot Training
SABER is constructed from a dual-stream capture architecture — egocentric GoPro + exocentric ALIA 360° — across multiple real grocery stores.
1

In-Store Capture

100+ hours across multiple real grocery stores with head-mounted GoPro + DreamVu ALIA 360°

2

Action Extraction

LAPA encoding, hand pose estimation, and SMPL body estimation with human QC annotation

3

Robot Retargeting

Dex-Retargeting to robot hand joint space + SMPL-to-Unitree G1 whole-body retargeting

4

VLA Post-Training

Shared-backbone multi-task training on GR00T N1.6 with flow-matching objective

Capture Sessions & Task Annotations
Annotated in-store capture footage from the SABER dataset — showing retail manipulation tasks with action labels and multi-scene diversity.
RoboBenchMart Results
SABER-MM post-training on GR00T N1.6 evaluated across 10 retail manipulation tasks spanning fridge, board-to-board, floor pick, and basket pick categories.
29.3%
13.4%
91%
43%
17%
3%
2.19×
Mean improvement over baseline
SABER-MM vs. RoboBenchMart fine-tuning only
SABER-MM
Baseline
Task Category Baseline (RBM FT) SABER-MM Change
fridge (avg open + close) Fridge 0.43 0.91 +112%
board_to_board_duff Board 0.10 0.10
board_to_board_nestle Board 0.02 0.02
board_to_board_vanish Board 0.02 0.11 +450%
pick_from_floor_beans Floor 0.04 0.17 +325%
pick_from_floor_slam Floor 0.02 0.17 +750%
pick_to_basket_fanta Basket 0.08 0.19 +138%
pick_to_basket_nivea Basket 0.08 0.21 +163%
pick_to_basket_stars Basket 0.12 0.14 +17%
Mean (all tasks) 0.134 0.293 +119%
SABER-MM Data Composition
The post-training corpus combines SABER's three streams with robot-native anchor data and task-aligned demonstrations — totaling ~52.1K samples.
52.1K
Total Samples
SABER — LAPA Latent Actions
25K samples · Egocentric video
48.0%
SABER — Hand Retargets
18.6K samples · Dex-Retargeting
35.7%
SABER — Body Retargets
1.2K samples · Unitree G1
2.3%
NVIDIA Robot Data
4.8K samples · Anchor signal
9.2%
RoboBenchMart
2.5K samples · Task-aligned
4.8%
What SABER Demonstrates
Finding 01

Human Video Scales Where Teleoperation Can't

SABER demonstrates that high-fidelity naturalistic human behavior, systematically captured and retargeted, is a viable and scalable foundation for domain-specific robot adaptation — without a robot in the loop.

Finding 02

Three Streams Are Complementary

LAPA tokens capture whole-arm trajectory, Dex-Retargeting provides finger-level precision, and body retargets supply torso-arm-leg coordination. Together they provide non-overlapping kinematic information.

Finding 03

Robot-Native Anchor Stabilizes Training

The 4,800-sample robot-native anchor data proved necessary to stabilize early training even at SABER's scale, suggesting general manipulation signal matters for robust convergence.

Finding 04

Task Progress Beyond Binary Success

SABER-MM teaches models to progress further through each task sequence — mean P≥2/3 of 0.445 vs 0.278 baseline — indicating reaching and grasping are well-learned while placement remains the frontier.

Cite This Work
@article{dreamvu2026saber,
  title   = {SABER: A Scalable Action-Based Embodied Dataset
             for Real-World VLA Adaptation},
  author  = {Menga, Narsimha and Sakurikar, Parikshit and Rouhi, Amirreza
             and Reddy, Satya Sai and Govil, Anirudh and Chittajallu, Sri Harsha
             and Aggarwal, Rajat and Namboodiri, Anoop and Reddi, Sashi},
  year    = {2026},
  month   = {May},
  note    = {DreamVu Inc.},
  url     = {https://dreamvu.ai/saber}
}

Ready to Build the Data Layer for Retail Robots?

The SABER-10K subset is available now. Full dataset and code at dreamvu.ai/saber.

Download SABER-10K on HuggingFace Full Paper & Dataset → Contact Sales