Skip to content

DexRobot/dexcanvas

Repository files navigation

Hand Trajectory Loader

A specialized PyTorch dataset for loading human hand manipulation trajectories from Lance columnar format, designed with two distinct data loading modes to serve different learning paradigms.

Why Two Data Loading Modes?

This dataset provides two fundamentally different loading mechanisms because supervised learning and reinforcement learning have incompatible data access patterns:

Mode 1: Standard PyTorch DataLoader (Supervised Learning)

Traditional supervised learning processes data in synchronized batches where all sequences advance together. This mode is ideal for tasks like trajectory prediction, behavior cloning, or motion analysis where you need the complete temporal context.

Mode 2: Custom RL Buffer (Parallel RL Environments)

Reinforcement learning with parallel environments requires:

  • Stateful tracking: Each environment maintains its own position within a trajectory
  • Asynchronous resets: When environment #5 completes its trajectory, only that specific trajectory should be replaced—not the entire batch
  • Persistent GPU memory: All trajectories remain in GPU memory for efficient indexed access

The custom RL buffer addresses these requirements that standard DataLoaders cannot handle.

Installation

pip install -e .

Requirements:

  • Python 3.8+ (tested on Python 3.8)
  • PyTorch 1.10+
  • NumPy <1.24 (for chumpy compatibility)

Quick Start

from hand_trajectory_loader import HandTrajectoryDataset, create_dataloader

# Download dataset first (see Downloading Dataset section below)
# Or set MANO_DATASET_PATH=/path/to/dataset.lance

# Load dataset
dataset = HandTrajectoryDataset()

# Create DataLoader
dataloader = create_dataloader(dataset, batch_size=32, shuffle=True)

# Training loop
for batch in dataloader:
    mano_rotations = batch['mano_rotations']      # [B, T, 15, 3]
    object_pose = batch['object_pose']            # [B, T, 6]
    attention_mask = batch['attention_mask']      # [B, T]

    # Your model training here
    predictions = model(mano_rotations, object_pose)
    loss = criterion(predictions, targets)

Downloading Dataset

Download the DexCanvas dataset from HuggingFace Hub:

# Install download dependencies
pip install huggingface_hub tqdm

# Download dataset (public)
python scripts/download_dataset.py --repo-id DEXROBOT/DexCanvas

# Download to specific directory
python scripts/download_dataset.py \
    --repo-id DEXROBOT/DexCanvas \
    --output-dir ./data

# For private datasets, authenticate first
huggingface-cli login
# Or provide token directly
python scripts/download_dataset.py \
    --repo-id DEXROBOT/DexCanvas \
    --token YOUR_HF_TOKEN

The download script will:

  • Download the dataset to ~/.dexcanvas/datasets/ (or your specified directory)
  • Verify the downloaded file integrity
  • Provide instructions to set MANO_DATASET_PATH

Configuration

Dataset Path

The dataset path can be configured in two ways:

  1. Environment Variable (Recommended): Set the MANO_DATASET_PATH environment variable:

    export MANO_DATASET_PATH=/path/to/trajectories.lance
  2. Direct Parameter: Pass the path directly when creating the dataset:

    dataset = HandTrajectoryDataset(lance_path="/path/to/trajectories.lance")

If both are provided, the direct parameter takes precedence.

Mode 1: Standard DataLoader for Supervised Learning

Ideal for tasks where you need complete trajectory sequences with temporal context:

from hand_trajectory_loader import HandTrajectoryDataset, create_dataloader

# Create dataset (will use MANO_DATASET_PATH env var if lance_path not provided)
dataset = HandTrajectoryDataset(
    lance_path="/path/to/trajectories.lance",  # Optional if MANO_DATASET_PATH is set
    operators=["s01", "s02"],        # Filter by human operator
    objects=["cube2", "cylinder"],   # Filter by manipulated object
    manipulation_types=["01", "02"], # Filter by manipulation strategy
    min_rating=3.0,                  # Filter by trajectory quality rating
    min_frames=50,                   # Filter by minimum number of frames
    max_sequence_length=1024,        # Sequences padded/truncated to this length
    load_active_only=True,           # Load only frames with active manipulation
    device="cpu"                     # Keep on CPU for DataLoader
)

# Standard PyTorch DataLoader
dataloader = create_dataloader(
    dataset,
    batch_size=32,
    num_workers=4,
    shuffle=True
)

# Training loop (see examples/object_prediction.py)
for batch in dataloader:
    mano_rotations = batch['mano_rotations']      # [B, T, 15, 3] Hand joint rotations
    object_pose = batch['object_pose']            # [B, T, 6] Object 6D pose
    attention_mask = batch['attention_mask']      # [B, T] Marks valid vs padded frames
    fps = batch['fps']                            # [B] Frame rate for each trajectory

    # Forward pass through your model
    predictions = model(mano_rotations, object_pose)

    # Loss computation with attention mask to ignore padded frames
    loss = criterion(predictions, targets)
    loss = loss * attention_mask.unsqueeze(-1)
    loss = loss.sum() / attention_mask.sum()

Mode 2: RL Buffer for Parallel Environments

Essential for reinforcement learning where each environment progresses independently:

from hand_trajectory_loader.rl_buffer import RLTrajectoryBuffer

# Initialize buffer with persistent GPU storage
buffer = RLTrajectoryBuffer(
    dataset=dataset,
    num_envs=1024,              # Number of parallel environments
    max_trajectory_length=300,  # Shorter sequences for RL
    device="cuda"
)

# RL training loop (see examples/rl_training.py)
for step in range(num_steps):
    # Each environment at its own timestep
    obs, dones = buffer.step(auto_reset=True)

    # obs contains current frame for each environment
    mano_rotations = obs['mano_rotations']  # [1024, 15, 3]
    object_pose = obs['object_pose']        # [1024, 6]

    # Your policy network
    actions = policy(mano_rotations.flatten(1))

    # Physics simulation would go here
    # ...

    # Buffer automatically replaces only completed trajectories
    # Environment 5 might reset while others continue

Why Custom RL Buffer?

Standard DataLoaders reset all sequences simultaneously—incompatible with RL's asynchronous nature. The custom buffer maintains independent trajectory positions and only replaces completed trajectories, enabling true parallel environment simulation.

Lance Format Storage

The dataset uses Lance columnar format for efficient storage and retrieval:

Advantages

  • Fast metadata reading: ~2ms to read all trajectory metadata (vs ~100ms for directory scanning)
  • Selective column loading: 23x faster loading by excluding mesh_vertices (219ms → 10ms per trajectory)
  • Efficient filtering: Query by rating, FPS, frames, or any metadata field
  • Compact storage: Columnar compression reduces storage by ~40%
  • Row-based indexing: Direct integer indexing into dataset rows (no string operations)

Dataset Filtering

The Lance format enables powerful metadata-based filtering:

# Filter by quality rating
dataset = HandTrajectoryDataset(
    lance_path="trajectories.lance",
    min_rating=3.5,  # Only trajectories rated 3.5 or higher
)

# Get dataset statistics
stats = dataset.get_statistics()
print(f"Total trajectories: {stats['total_trajectories']}")
print(f"Average rating: {stats['avg_rating']:.2f}")
print(f"Average frames: {stats['avg_frames']:.1f}")

# Find trajectories by specific criteria
high_quality_indices = dataset.filter_by_rating(min_rating=4.0)
target_fps_indices = dataset.filter_by_fps(target_fps=100.0)

Key Attributes

Trajectories are organized by:

  • Operator: Human subject performing the manipulation (s01, s02, etc.)
  • Object: Item being manipulated (cube2, cylinder, etc.)
  • Manipulation Type: Different strategies for manipulating the same object (01, 02, 03)
  • Rating: Quality score for the trajectory
  • FPS: Frame rate (typically 100 Hz)

These attributes let you control data diversity—train on specific operator-object combinations or generalize across all variations.

Handling Variable-Length Sequences

Real manipulation trajectories vary in length. The dataset provides two mechanisms:

Padding with Attention Masks

Sequences are padded to max_sequence_length with attention masks distinguishing real data from padding:

# attention_mask[t] = 1.0 for real data, 0.0 for padding

# Properly handle in loss computation
loss = criterion(predictions, targets)
loss = loss * attention_mask.unsqueeze(-1)  # Zero out padded positions
loss = loss.sum() / attention_mask.sum()     # Average only over real frames

Active Data Range Loading

Most trajectories contain idle frames before/after actual manipulation. Enable load_active_only=True to load only meaningful frames:

# Focus on manipulation, skip idle frames
dataset = HandTrajectoryDataset(
    lance_path="trajectories.lance",
    load_active_only=True,  # Load only active manipulation frames
    max_sequence_length=512  # Can use smaller buffers
)

# Reduces memory usage by 50-75% and accelerates training

Data Format

Lance Dataset Structure

The Lance dataset contains trajectories in columnar format with the following schema:

  • trajectory_meta_data: Metadata dictionary containing:

    • mocap_raw_data_source: Source information (operator, object, gesture, sequence)
    • data_fps: Frame rate in Hz
    • total_frames: Number of frames
    • rating: Quality rating
    • object_move_start_frame: Active manipulation start
    • object_move_end_frame: Active manipulation end
    • mano_hand_shape: Static hand shape parameters [10]
  • sequence_info.timestamp: Frame timestamps

  • sequence_info.hand_joint.position: Hand position [T, 3]

  • sequence_info.hand_joint.rotation: Hand rotation [T, 3]

  • sequence_info.hand_joint.finger_pose: Finger pose parameters [T, 45]

  • sequence_info.object_info.position: Object position [T, 3]

  • sequence_info.object_info.euler_angle: Object rotation [T, 3]

  • sequence_info.mano_model_output.joints: Joint positions [T, 21, 3] (optional)

  • sequence_info.mano_model_output.mesh_vertices: Mesh vertices [T, 778, 3] (not loaded by default)

Output Tensors

The dataset provides consistently formatted tensors:

  • mano_rotations: [T, 15, 3] - 15 hand joints × 3 axis-angle parameters
  • mano_shape: [T, 10] - Hand shape parameters
  • mano_translation: [T, 3] - Hand position
  • mano_global_rotation: [T, 3] - Hand orientation
  • object_pose: [T, 6] - Object 6D pose
  • attention_mask: [T] - Valid frame indicator
  • fps: Frame rate in Hz (typically 100 Hz)
  • time_offset: Time offset in seconds when using active loading

Timing Information

The dataset automatically:

  • Parses timestamps from Lance dataset
  • Calculates FPS from metadata (~100 Hz)
  • Converts timestamps to relative seconds from trajectory start
  • Provides time offsets when using load_active_only=True

Users can convert between frame indices and time using:

time_in_seconds = frame_index / fps
frame_index = int(time_in_seconds * fps)

Performance Optimization

The LanceAdapter uses selective column reading for optimal performance:

# Columns loaded by default (10ms per trajectory)
columns = [
    'trajectory_meta_data',
    'sequence_info.timestamp',
    'sequence_info.hand_joint.position',
    'sequence_info.hand_joint.rotation',
    'sequence_info.hand_joint.finger_pose',
    'sequence_info.object_info.position',
    'sequence_info.object_info.euler_angle',
    'sequence_info.mano_model_output.joints',
]

# mesh_vertices is excluded by default (would add 200ms per trajectory)
# Only load if you specifically need mesh data

This selective reading provides a 23x speedup compared to loading all columns.

Visualization

Visualize hand trajectories with 3D rendering using Open3D:

# Clone with submodules to get MANO models and object meshes
git clone --recursive [email protected]:ai/hand_trajectory_loader.git

# Or if already cloned, initialize submodules
git submodule update --init --recursive

# Install visualization dependencies (or use pip install -e . to install all deps)
pip install open3d trimesh scipy chumpy

# Visualize a trajectory (use your downloaded dataset path)
python examples/visualize_trajectory.py ~/.dexcanvas/datasets/mocap_ver0.1.parquet 0 \
    --mano-model assets/mano/models/MANO_RIGHT.pkl \
    --object assets/objects/cube2.stl \
    --show-joints \
    --max-sequence-length 2048

Visualization Controls

  • SPACE - Pause/Resume animation
  • M - Toggle hand mesh visibility
  • O - Toggle object mesh visibility
  • Q - Quit (or close window)

Available Assets

The assets submodule contains:

  • MANO models: assets/mano/models/MANO_LEFT.pkl, MANO_RIGHT.pkl
  • Object meshes: assets/objects/*.stl (cube, cylinder, sphere, etc.)

Examples

See complete working examples:

  • examples/visualize_trajectory.py - 3D visualization of hand trajectories
  • examples/object_prediction.py - Supervised learning for trajectory prediction
  • examples/rl_training.py - Parallel RL environment training

Testing

Current Status (Python 3.8):

  • ✅ RLTrajectoryBuffer: All 13 tests passing
  • ⚠️ Dataset tests: Legacy NPY format tests (deprecated, use Lance/Parquet format)
# Run RL Buffer tests (recommended)
pytest tests/test_rl_buffer.py -v

# Note: test_dataset.py and test_active_loading.py are for legacy NPY format
# Current version uses Lance/Parquet format - see examples/ for usage

The project has migrated from NPY to Lance/Parquet format. Legacy dataset tests are kept for reference but use deprecated APIs.

License

MIT

About

dataloader for mocap dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages