Franka Kitchen Pick and Place Task implemented in IsaacLab using Behavior Cloning based Imitation Learning Policy
- This repository contains the implementation, data-generation pipelines, and trained behavior-cloning models for a kitchen pick-and-place task using IsaacLab.
- Task: a Franka Emika manipulator picks a tomato soup can (container) from inside a fridge and places it inside the microwave cavity.
- We reused the IsaacLab sample Franka Lift Cube task and adapted to our kitchen task by modifying the internal source code of IsaacLab.
- To reuse the code, please clone IsaacLab 2.1.1 release and use IsaacSim 4.5 version and replace existing source code with the files and folders given under IsaacLab_internal_source.
- The other folders outside the IsaacLab_internal_source are meant to create the similar kitchen scene with kinova gen3 7DOF arm as an external project in IsaacLab, primarily created for RL PPO implementation(which are not implemented currently).
- A custom kitchen scene (shelf, fridge, microwave, tomato soup can) with Franka Panda Emika Arm is provided.
- Teleoperated demonstrations were recorded, annotated (Isaac Mimic tooling), and expanded with isaacmimic gen to produce ~300 annotated demonstrations (from 10 teleoperated trials, parallelized generation).
- Behavior cloning training was performed using robomimic with multiple configurations (state-based and visuomotor image-based policies). Several 'fast' and 'ultrafast' variants were used to trade off training time vs accuracy.
- main (default) - Main/production branch with stable code containing state and visuomotor based implementation for franka kitchen task
- dev (stale) - Not used
- il_bc_visuomotor - Imitation Learning with Behavior Cloning state and visuomotor policy impl, contains code for mimicgen and robomimic isaaclab env & configs
- kitchen_scene - Basic Kitchen environment scene implementation with kinova arm, contains code specific to teleoperation of kinova robot in isaaclab using Absolute and Relative IK.
- rl_ppo_franka - Reinforcement Learning with PPO algorithm impl for Franka robot with kitchen task (not tested)
- robocasa - RoboCasa simulation environment integration with Kinova arm with kitchen scene and some basic teleoperation script
- Record teleop demos (example using teleoperation script):
./isaaclab.sh -p scripts/environments/teleoperation/teleop_se3_agent.py \
--task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 \
--num_envs 1 --enable_cameras- Annotate teleop demos (example):
./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/annotate_demos.py \
--task Isaac-Kitchen-Lift-Franka-IK-Rel-Visuomotor-Mimic-v0 \
--input_file ./datasets/kitchen_task_vision_11_.hdf5 \
--output_file ./datasets/annotated_dataset_modified2.hdf5 \
--enable_cameras- Generate expanded dataset (isaacmimic gen, 10 parallel envs -> 300 trials):
./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py \
--enable_cameras --headless --num_envs 10 --generation_num_trials 300 \
--input_file ./datasets/annotated_dataset_modified2.hdf5 \
--output_file ./datasets/generated_dataset_large.hdf5Below is a screenshot showing the 300-trial generated dataset replayed using the replay demo script (5 parallel envs). The command used to run the replay demo is shown as a reference.
Replay command used:
./isaaclab.sh -p scripts/tools/replay_demos.py \
--task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 \
--dataset_file ./datasets/generated_dataset_sample_300.hdf5 --enable_cameras --num_envs 5
Note: when expanding 10 manually annotated teleoperated demonstrations into 300 trials using isaacmimic gen, the generator creates both successful and failed trials. The generated dataset includes both kinds of trials; a trial is considered successful based on the success term function object_in_microwave_and_hand_out().
Below is a visualization showing the dataset expansion process (success & failure trials):
- Data split use the script from robomimic repo (train/validation, 1:10 ratio):
./isaaclab.sh -p split_train_val.py \
--dataset ./datasets/generated_dataset_sample_300.hdf5 --ratio 0.1- Example training (state-based BC):
./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
--task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 \
--algo bc \
--dataset ./datasets/generated_dataset_sample_300.hdf5- Example play / evaluation (state-based BC):
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
--task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 --num_rollouts 50 \
--checkpoint /path/to/models/model_epoch_best_validation.pth \
--horizon 2500 --enable_cameras- Visuomotor ultra-fast training/play example:
./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
--task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-UltraFast-v0 \
--algo bc --dataset ./datasets/generated_dataset_sample_300.hdf5
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
--task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-UltraFast-v0 --num_rollouts 50 \
--checkpoint /path/to/models/best_validation_epoch.pth --enable_cameras- A mimic env cfg and mimic env were added to support dataset expansion (isaacmimic gen) from a small set of annotated trials. The mimic env supports generating annotated datasets with cameras enabled and parallel envs.
- Subtask breakdown in mimic environment: move to fridge, grasp tomato can, move to microwave, place in microwave (see
franka_kitchen_lift_visuomotor_mimic_env_cfg.py). - Custom success detection function
object_in_microwave_and_hand_out()verifies tomato can placement, gripper openness, and hand clearance (seekitchen_joint_pos_env_cfg.py). - Custom observation functions:
object_position_in_robot_root_frame(),ee_frame_pos(),ee_frame_quat(),gripper_pos()(seemdp/observations.py). - Asset reuse from Lightwheel: Franka kitchen scene reuses assets from Kinova folder for consistency and compatibility (see
config/kinova/assets/). - Scripts & logs used to generate the 300 annotated demos from 10 teleoperated trials were added; logs are available in
log_dir/. - Robomimic BC configs were added for state-based and visuomotor policies (multiple RNN and ResNet-18 + R3M variants).
- Lighting randomization, scene scaling for parallel envs, and wrist-camera optimizations were added for better generalization and stable data collection.
- IsaacLab_internal_source/
- lift/ — core lift task code and configs
- lift/lift_env_cfg.py — base environment config with scene setup, MDP settings
- lift/mdp/ — custom MDP components
- custom_events.py — light intensity randomization for scene variation
- observations.py — custom observation functions (object position, EE frame, gripper pos)
- rewards.py, terminations.py — reward and termination functions
- lift/config/franka/ — Franka-specific envs and kitchen scene
- kitchen_scene.py — kitchen scene (table, shelf, fridge, microwave, tomato soup can) with Franka arm
- kitchen_joint_pos_env_cfg.py — joint-position kitchen env for Franka with camera config - contains simulation logic for all parts of the project
- kitchen_ik_rel_env_cfg.py — relative IK env for Franka (teleop/eval)
- kitchen_teleop_env_cfg.py — teleoperation env + tweaks for demo collection
- agents/ — agent config files for training (robomimic / rsl / sb3 / skrl)
- init.py — Gym environment registrations
- lift/config/kinova/ — original kinova assets reused by the kitchen scene
- isaaclab_mimic/ — mimic environment for dataset generation
- envs/ — mimic environment implementations
- franka_kitchen_lift_visuomotor_mimic_env_cfg.py — visuomotor mimic env config with subtask breakdown
- franka_kitchen_lift_visuomotor_mimic_env.py — environment wrapper for annotation and data generation for kitchen task using isaac mimicgen
- init.py — Gym registration for mimic environments
- envs/ — mimic environment implementations
- imitation_learning/isaaclab_mimic/ — mimic annotation + generation scripts
- annotate_demos.py, generate_dataset.py, consolidated_demo.py — utilities for annotation and dataset generation
- trained_bc_models/ — trained BC artifacts and zipped bundles; contains configs & logs. Models attached separately to github releases.
- log_dir/ — logs generated during isaac mimic gen annotation, dataset generation pipelines and rollout logs(validation with the trained BC IL models).
- lift/ — core lift task code and configs
Please find the model results here
Training Stats (Example):
Epoch 599 Memory Usage: 3023 MB Train Epoch 600: Loss ≈ -22, Log Likelihood ≈ 22, Policy Grad Norms ≈ 7465 Validation Epoch 600: Loss ≈ 413,689, Log Likelihood ≈ -413,689 Checkpoint saved at epoch 600. Memory usage stable at 3023 MB.
Observations:
- Training loss remained around -22 for many epochs, while validation loss stayed in the thousands, with no significant improvement across all 5 configs (state and visuomotor policies).
- State-based policy: Franka arm moved near the fridge and attempted to grasp the can, but missed every time. Adding a camera and training a visuomotor policy was expected to help, but results were sometimes worse than state-based.
- Visuomotor policy: Performance was similar across configs, regardless of RNN size. Pretrained R3M visual features led to less erratic movement, but slow arm motion and failed grasp due to lack of depth input. Training visual policy from scratch performed worse than with R3M weights.
Evaluation Command Example:
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
--task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 --num_rollouts 20 \
--checkpoint /path/to/models/model_epoch_best_validation.pth \
--horizon 2500 --enable_camerasFull evaluation for all 5 configs (state & visuomotor):
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py --task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-UltraFast-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-UltraFast-v0/bc_rnn_image_franka_kitchen_lift_ultrafast/20250908045749/models/model_epoch_140.pth --enable_cameras --horizon 2400 --headless && \
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py --task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-v0/bc_rnn_image_franka_kitchen_lift/20250908100825/models/model_epoch_435_best_validation_245846.2421875.pth --enable_cameras --horizon 2400 --headless && \
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py --task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-Fast-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-Fast-v0/bc_rnn_image_franka_kitchen_lift_fast/20250907053339/models/model_epoch_78_best_validation_198896.4625.pth --enable_cameras --horizon 2400 --headless && \
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py --task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-v0/bc_rnn_low_dim_franka_kitchen_lift/20250907215215/models/model_epoch_858_best_validation_602055.42890625.pth --enable_cameras --horizon 2400 --headless && \
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py --task Isaac-Lift-Kitchen-Franka-IK-Rel-Fast-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-Fast-v0/bc_rnn_low_dim_franka_kitchen_lift_fast/20250907205615/models/model_epoch_75_best_validation_199577.3265625.pth --enable_cameras --horizon 2400 --headlessSee the linked video for more details and rollout results.
Rollout results can found in following file for all above configs: IsaacLab_internal_source/log_dir/bc_rollout_logs.txt
| Config Name | Input Modality | RNN Hidden Dim / Layers | GMM Modes | Visual Encoder | Pretrained Weights | Key Advantage / Limitation |
|---|---|---|---|---|---|---|
| bc_rnn_low_dim_franka_kitchen_lift_fast | State (low_dim) | 512 / 2 | 5 | None | N/A | Fast training, simple, but lacks visual context |
| bc_rnn_low_dim_franka_kitchen_lift | State (low_dim) | 1000 / 3 | 8 | None | N/A | Larger RNN, more modes, but still no visual input |
| bc_rnn_image_franka_kitchen_lift_ultrafast | Visuomotor (RGB) | 32 / 1 | 3 | R3MConv (resnet18) | Yes | Ultrafast, small RNN, uses pretrained visual features |
| bc_rnn_image_franka_kitchen_lift_fast | Visuomotor (RGB) | 128 / 1 | 5 | R3MConv (resnet18) | Yes | Fast, moderate RNN, pretrained visual features |
| bc_rnn_image_franka_kitchen_lift | Visuomotor (RGB) | 1000 / 3 | 8 | ResNet18Conv | No | Large RNN, visual features trained from scratch |
Summary:
- State-based configs (low_dim) are faster to train and simpler, but lack visual context, which limits grasping performance.
- Visuomotor configs (RGB) use wrist camera input; those with pretrained R3MConv visual features (ultrafast, fast) showed more stable and less erratic movement, but were slow and still failed to grasp due to lack of depth.
- The largest visuomotor config (normal) trained visual features from scratch, but did not outperform the pretrained configs and was more erratic.
- Increasing RNN size and GMM modes led to more natural movement, but did not improve task completion.
- Overall, using pretrained R3M weights (ultrafast, fast) provided the most stable visuomotor policies, but none of the configs achieved successful task completion in rollouts.
- This suggests that for successful task completion with current trained results, we need to do RL PPO training with these BC policy configured as initial weights.
This section summarizes the most important configuration options for behavior cloning (BC) in IsaacLab/robomimic tasks:
- algo_name: Algorithm type (usually
bcfor behavior cloning). - experiment: Controls logging, saving, validation, and rollout settings.
validate: Enables validation during training.logging: Options for TensorBoard and terminal output.save: When and how to save checkpoints (e.g., every N epochs, on best validation).epoch_every_n_steps: Number of steps per training epoch.
- train: Data loading and training loop settings.
num_data_workers: Number of CPU workers for data loading (higher = faster, up to available cores).hdf5_cache_mode: What to cache in RAM (all,low_dim, orNone).batch_size: Number of samples per training batch (higher = better GPU utilization, but more VRAM needed).num_epochs: Total number of training epochs.seq_length: Length of input sequence for RNNs (higher = more temporal context).
- algo: Model architecture and optimization.
optim_params: Learning rate, decay schedule, and regularization.actor_layer_dims: Hidden layer sizes for the policy network.gmm: Number of Gaussian Mixture Model modes (higher = more action diversity).rnn: RNN settings (enabled, hidden size, layers, etc.).
- observation: Defines input modalities and encoders.
modalities: Which observation types are used (low_dim,rgb, etc.).encoder: Backbone network (e.g.,ResNet18Conv,R3MConv), feature dimension, pooling, and randomization/cropping.
- batch_size: Larger batches speed up training and improve stability, but require more GPU memory.
- num_data_workers: More workers reduce data loading bottlenecks, especially for image-based BC.
- hdf5_cache_mode: Caching images (
all) is fastest but uses a lot of RAM;low_dimis safer for large datasets. - gmm.num_modes: More modes allow the policy to represent more complex/multimodal actions.
- rnn.hidden_dim/layers: Larger RNNs capture more temporal dependencies but use more memory.
- encoder.backbone_class: Choice of backbone affects visual feature quality and training speed.
See the config files in isaaclab_tasks/manager_based/manipulation/lift/config/franka/agents/robomimic/ for examples and recommended settings for different hardware and task complexity.
- Camera config: wrist camera must have valid
widthandheightvalues — defaults used in this project are88x88. - Asset paths: Franka kitchen reuses assets in
lift/config/kinova/assets/; keep the relative paths intact. - Mimic generation: generating datasets with images can be memory/GPU intensive. Use headless mode and set
num_envsaccording to available GPU resources. - Validation: BC checkpoints were selected by validation loss; ensure the model and config match the observation/action spaces when loading.
- Ensure Isaac Sim / IsaacLab environment is installed and
isaaclab.shworks. - Record teleop demos using
scripts/environments/teleoperation/teleop_se3_agent.py. - Annotate demos via
annotate_demos.py. - Run
generate_dataset.pywith--num_envsparallelization to expand the annotated trials. - Split dataset using the robomimic split script.
- Run robomimic training scripts (configs available in
agents/andtrained_bc_models/*/config.json). - Evaluate with
robomimic/play.pyusing the selected checkpoint.
The dataset consists of annotated demonstrations for the kitchen pick-and-place task, generated using teleoperation and expanded via the mimic environment. Key files:
datasets/kitchen_task_vision_11_.hdf5— raw teleop demonstrationsdatasets/annotated_dataset_modified2.hdf5— annotated dataset after processingdatasets/generated_dataset_large.hdf5— expanded dataset (300 trials, 10 parallel envs)
Each dataset contains:
- Low-dimensional observations (eef pose, joint positions, object pose, actions)
- Camera images (if enabled)
- Subtask signals for task segmentation (move to fridge, grasp, move to microwave, place it inside microwave) used by isaac mimic to generate new datasets from 10 trials.
See the scripts in imitation_learning/isaaclab_mimic/ for annotation and generation details.
- IsaacLab: Core simulation and RL environment
- Robomimic: Behavior cloning and imitation learning framework
- Isaac Mimic: Annotation, dataset expansion, and imitation learning documentation
- Franka Emika Panda: Robot platform used in the task
- Kinova IsaacLab Sim2Real: Sim2real pipeline for Kinova and IsaacLab
- Robomimic Pretrained Representations: R3M model usage in BC training
- R3M Model: Pretrained R3M representation for robot learning
- IsaacLab Tutorials: IsaacLab tutorials and guides
- IsaacLab Task Workflows: IsaacLab task workflow documentation
- IsaacLab Official Tutorials: IsaacLab official tutorials
- Lightwheel, "Lightwheel Kitchen: 3D Kitchen Asset Collection for NVIDIA Isaac Sim," Version v1, 2025. [Online]. Available: https://github.com/LightwheelAI/Lightwheel_Kitchen — Kitchen assets used in this project
For further details, see the documentation and code comments in the respective modules.
This project is maintained and developed as part of Cognitive Robotics Course Project.
Below are the contributions:
Sai Mukkundan Ramamoorthy - Kitchen scene setup script in IsaacLab, Parallel environment Simulation setup with franka, along with Behavior Cloning training and validation with state based and visuomotor based policy.
Aaron Cuthinho - for teleoperation, dataset annotation, dataset augmentation creation and RL PPO scripts.
Saloni Pathak - Kitchen Scene setup script with kinova arm in Robocasa, teleoperation script setup in robocasa.

