Franka Kitchen Pick and Place Task implemented in IsaacLab using Behavior Cloning based Imitation Learning Policy

Overview

This repository contains the implementation, data-generation pipelines, and trained behavior-cloning models for a kitchen pick-and-place task using IsaacLab.
Task: a Franka Emika manipulator picks a tomato soup can (container) from inside a fridge and places it inside the microwave cavity.
We reused the IsaacLab sample Franka Lift Cube task and adapted to our kitchen task by modifying the internal source code of IsaacLab.
To reuse the code, please clone IsaacLab 2.1.1 release and use IsaacSim 4.5 version and replace existing source code with the files and folders given under IsaacLab_internal_source.
The other folders outside the IsaacLab_internal_source are meant to create the similar kitchen scene with kinova gen3 7DOF arm as an external project in IsaacLab, primarily created for RL PPO implementation(which are not implemented currently).

Highlights

A custom kitchen scene (shelf, fridge, microwave, tomato soup can) with Franka Panda Emika Arm is provided.
Teleoperated demonstrations were recorded, annotated (Isaac Mimic tooling), and expanded with isaacmimic gen to produce ~300 annotated demonstrations (from 10 teleoperated trials, parallelized generation).
Behavior cloning training was performed using robomimic with multiple configurations (state-based and visuomotor image-based policies). Several 'fast' and 'ultrafast' variants were used to trade off training time vs accuracy.

Branch Strategy

main (default) - Main/production branch with stable code containing state and visuomotor based implementation for franka kitchen task
dev (stale) - Not used
il_bc_visuomotor - Imitation Learning with Behavior Cloning state and visuomotor policy impl, contains code for mimicgen and robomimic isaaclab env & configs
kitchen_scene - Basic Kitchen environment scene implementation with kinova arm, contains code specific to teleoperation of kinova robot in isaaclab using Absolute and Relative IK.
rl_ppo_franka - Reinforcement Learning with PPO algorithm impl for Franka robot with kitchen task (not tested)
robocasa - RoboCasa simulation environment integration with Kinova arm with kitchen scene and some basic teleoperation script

Primary files & commands

Record teleop demos (example using teleoperation script):

./isaaclab.sh -p scripts/environments/teleoperation/teleop_se3_agent.py \
  --task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 \
  --num_envs 1 --enable_cameras

Annotate teleop demos (example):

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/annotate_demos.py \
  --task Isaac-Kitchen-Lift-Franka-IK-Rel-Visuomotor-Mimic-v0 \
  --input_file ./datasets/kitchen_task_vision_11_.hdf5 \
  --output_file ./datasets/annotated_dataset_modified2.hdf5 \
  --enable_cameras

Generate expanded dataset (isaacmimic gen, 10 parallel envs -> 300 trials):

./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py \
  --enable_cameras --headless --num_envs 10 --generation_num_trials 300 \
  --input_file ./datasets/annotated_dataset_modified2.hdf5 \
  --output_file ./datasets/generated_dataset_large.hdf5

300 IsaacMimic generated dataset visualised using replay demo script

Below is a screenshot showing the 300-trial generated dataset replayed using the replay demo script (5 parallel envs). The command used to run the replay demo is shown as a reference.

Replay command used:

./isaaclab.sh -p scripts/tools/replay_demos.py \
  --task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 \
  --dataset_file ./datasets/generated_dataset_sample_300.hdf5 --enable_cameras --num_envs 5

Replay video

Note: when expanding 10 manually annotated teleoperated demonstrations into 300 trials using isaacmimic gen, the generator creates both successful and failed trials. The generated dataset includes both kinds of trials; a trial is considered successful based on the success term function object_in_microwave_and_hand_out().

Below is a visualization showing the dataset expansion process (success & failure trials):

Data split use the script from robomimic repo (train/validation, 1:10 ratio):

./isaaclab.sh -p split_train_val.py \
  --dataset ./datasets/generated_dataset_sample_300.hdf5 --ratio 0.1

Example training (state-based BC):

./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
  --task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 \
  --algo bc \
  --dataset ./datasets/generated_dataset_sample_300.hdf5

Example play / evaluation (state-based BC):

./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
  --task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 --num_rollouts 50 \
  --checkpoint /path/to/models/model_epoch_best_validation.pth \
  --horizon 2500 --enable_cameras

Visuomotor ultra-fast training/play example:

./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
  --task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-UltraFast-v0 \
  --algo bc --dataset ./datasets/generated_dataset_sample_300.hdf5

./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
  --task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-UltraFast-v0 --num_rollouts 50 \
  --checkpoint /path/to/models/best_validation_epoch.pth --enable_cameras

Development notes

A mimic env cfg and mimic env were added to support dataset expansion (isaacmimic gen) from a small set of annotated trials. The mimic env supports generating annotated datasets with cameras enabled and parallel envs.
Subtask breakdown in mimic environment: move to fridge, grasp tomato can, move to microwave, place in microwave (see franka_kitchen_lift_visuomotor_mimic_env_cfg.py).
Custom success detection function object_in_microwave_and_hand_out() verifies tomato can placement, gripper openness, and hand clearance (see kitchen_joint_pos_env_cfg.py).
Custom observation functions: object_position_in_robot_root_frame(), ee_frame_pos(), ee_frame_quat(), gripper_pos() (see mdp/observations.py).
Asset reuse from Lightwheel: Franka kitchen scene reuses assets from Kinova folder for consistency and compatibility (see config/kinova/assets/).
Scripts & logs used to generate the 300 annotated demos from 10 teleoperated trials were added; logs are available in log_dir/.
Robomimic BC configs were added for state-based and visuomotor policies (multiple RNN and ResNet-18 + R3M variants).
Lighting randomization, scene scaling for parallel envs, and wrist-camera optimizations were added for better generalization and stable data collection.

Folder layout

IsaacLab_internal_source/
- lift/ — core lift task code and configs
  - lift/lift_env_cfg.py — base environment config with scene setup, MDP settings
  - lift/mdp/ — custom MDP components
    - custom_events.py — light intensity randomization for scene variation
    - observations.py — custom observation functions (object position, EE frame, gripper pos)
    - rewards.py, terminations.py — reward and termination functions
  - lift/config/franka/ — Franka-specific envs and kitchen scene
    - kitchen_scene.py — kitchen scene (table, shelf, fridge, microwave, tomato soup can) with Franka arm
    - kitchen_joint_pos_env_cfg.py — joint-position kitchen env for Franka with camera config - contains simulation logic for all parts of the project
    - kitchen_ik_rel_env_cfg.py — relative IK env for Franka (teleop/eval)
    - kitchen_teleop_env_cfg.py — teleoperation env + tweaks for demo collection
    - agents/ — agent config files for training (robomimic / rsl / sb3 / skrl)
    - init.py — Gym environment registrations
  - lift/config/kinova/ — original kinova assets reused by the kitchen scene
- isaaclab_mimic/ — mimic environment for dataset generation
  - envs/ — mimic environment implementations
    - franka_kitchen_lift_visuomotor_mimic_env_cfg.py — visuomotor mimic env config with subtask breakdown
    - franka_kitchen_lift_visuomotor_mimic_env.py — environment wrapper for annotation and data generation for kitchen task using isaac mimicgen
    - init.py — Gym registration for mimic environments
- imitation_learning/isaaclab_mimic/ — mimic annotation + generation scripts
  - annotate_demos.py, generate_dataset.py, consolidated_demo.py — utilities for annotation and dataset generation
- trained_bc_models/ — trained BC artifacts and zipped bundles; contains configs & logs. Models attached separately to github releases.
- log_dir/ — logs generated during isaac mimic gen annotation, dataset generation pipelines and rollout logs(validation with the trained BC IL models).

Video Output of the BC (state & visuomotor) rollouts:

Please find the model results here

Behavior Cloning Training & Evaluation Summary

Training Stats (Example):

Epoch 599 Memory Usage: 3023 MB Train Epoch 600: Loss ≈ -22, Log Likelihood ≈ 22, Policy Grad Norms ≈ 7465 Validation Epoch 600: Loss ≈ 413,689, Log Likelihood ≈ -413,689 Checkpoint saved at epoch 600. Memory usage stable at 3023 MB.

Observations:

Training loss remained around -22 for many epochs, while validation loss stayed in the thousands, with no significant improvement across all 5 configs (state and visuomotor policies).
State-based policy: Franka arm moved near the fridge and attempted to grasp the can, but missed every time. Adding a camera and training a visuomotor policy was expected to help, but results were sometimes worse than state-based.
Visuomotor policy: Performance was similar across configs, regardless of RNN size. Pretrained R3M visual features led to less erratic movement, but slow arm motion and failed grasp due to lack of depth input. Training visual policy from scratch performed worse than with R3M weights.

Evaluation Command Example:

./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py \
  --task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 --num_rollouts 20 \
  --checkpoint /path/to/models/model_epoch_best_validation.pth \
  --horizon 2500 --enable_cameras

Full evaluation for all 5 configs (state & visuomotor):

./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py   --task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-UltraFast-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-UltraFast-v0/bc_rnn_image_franka_kitchen_lift_ultrafast/20250908045749/models/model_epoch_140.pth --enable_cameras --horizon 2400 --headless && \
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py   --task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-v0/bc_rnn_image_franka_kitchen_lift/20250908100825/models/model_epoch_435_best_validation_245846.2421875.pth --enable_cameras --horizon 2400 --headless && \
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py   --task Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-Fast-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-Visuomotor-Fast-v0/bc_rnn_image_franka_kitchen_lift_fast/20250907053339/models/model_epoch_78_best_validation_198896.4625.pth --enable_cameras --horizon 2400 --headless && \
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py   --task Isaac-Lift-Kitchen-Franka-IK-Rel-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-v0/bc_rnn_low_dim_franka_kitchen_lift/20250907215215/models/model_epoch_858_best_validation_602055.42890625.pth --enable_cameras --horizon 2400 --headless && \
./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py   --task Isaac-Lift-Kitchen-Franka-IK-Rel-Fast-v0 --num_rollouts 20 --checkpoint /home/saiga/Documents/Cognitive_Robotics/simulation_packages/IsaacLab/logs/robomimic/Isaac-Lift-Kitchen-Franka-IK-Rel-Fast-v0/bc_rnn_low_dim_franka_kitchen_lift_fast/20250907205615/models/model_epoch_75_best_validation_199577.3265625.pth --enable_cameras --horizon 2400 --headless

See the linked video for more details and rollout results. Rollout results can found in following file for all above configs: IsaacLab_internal_source/log_dir/bc_rollout_logs.txt

Comparison of the 5 Behavior Cloning (BC) Configs

Config Name	Input Modality	RNN Hidden Dim / Layers	GMM Modes	Visual Encoder	Pretrained Weights	Key Advantage / Limitation
bc_rnn_low_dim_franka_kitchen_lift_fast	State (low_dim)	512 / 2	5	None	N/A	Fast training, simple, but lacks visual context
bc_rnn_low_dim_franka_kitchen_lift	State (low_dim)	1000 / 3	8	None	N/A	Larger RNN, more modes, but still no visual input
bc_rnn_image_franka_kitchen_lift_ultrafast	Visuomotor (RGB)	32 / 1	3	R3MConv (resnet18)	Yes	Ultrafast, small RNN, uses pretrained visual features
bc_rnn_image_franka_kitchen_lift_fast	Visuomotor (RGB)	128 / 1	5	R3MConv (resnet18)	Yes	Fast, moderate RNN, pretrained visual features
bc_rnn_image_franka_kitchen_lift	Visuomotor (RGB)	1000 / 3	8	ResNet18Conv	No	Large RNN, visual features trained from scratch

Summary:

State-based configs (low_dim) are faster to train and simpler, but lack visual context, which limits grasping performance.
Visuomotor configs (RGB) use wrist camera input; those with pretrained R3MConv visual features (ultrafast, fast) showed more stable and less erratic movement, but were slow and still failed to grasp due to lack of depth.
The largest visuomotor config (normal) trained visual features from scratch, but did not outperform the pretrained configs and was more erratic.
Increasing RNN size and GMM modes led to more natural movement, but did not improve task completion.
Overall, using pretrained R3M weights (ultrafast, fast) provided the most stable visuomotor policies, but none of the configs achieved successful task completion in rollouts.
This suggests that for successful task completion with current trained results, we need to do RL PPO training with these BC policy configured as initial weights.

Behavior Cloning: Key Configurations & Parameters

This section summarizes the most important configuration options for behavior cloning (BC) in IsaacLab/robomimic tasks:

algo_name: Algorithm type (usually bc for behavior cloning).
experiment: Controls logging, saving, validation, and rollout settings.
- validate: Enables validation during training.
- logging: Options for TensorBoard and terminal output.
- save: When and how to save checkpoints (e.g., every N epochs, on best validation).
- epoch_every_n_steps: Number of steps per training epoch.
train: Data loading and training loop settings.
- num_data_workers: Number of CPU workers for data loading (higher = faster, up to available cores).
- hdf5_cache_mode: What to cache in RAM (all, low_dim, or None).
- batch_size: Number of samples per training batch (higher = better GPU utilization, but more VRAM needed).
- num_epochs: Total number of training epochs.
- seq_length: Length of input sequence for RNNs (higher = more temporal context).
algo: Model architecture and optimization.
- optim_params: Learning rate, decay schedule, and regularization.
- actor_layer_dims: Hidden layer sizes for the policy network.
- gmm: Number of Gaussian Mixture Model modes (higher = more action diversity).
- rnn: RNN settings (enabled, hidden size, layers, etc.).
observation: Defines input modalities and encoders.
- modalities: Which observation types are used (low_dim, rgb, etc.).
- encoder: Backbone network (e.g., ResNet18Conv, R3MConv), feature dimension, pooling, and randomization/cropping.

Impact of Key Parameters

batch_size: Larger batches speed up training and improve stability, but require more GPU memory.
num_data_workers: More workers reduce data loading bottlenecks, especially for image-based BC.
hdf5_cache_mode: Caching images (all) is fastest but uses a lot of RAM; low_dim is safer for large datasets.
gmm.num_modes: More modes allow the policy to represent more complex/multimodal actions.
rnn.hidden_dim/layers: Larger RNNs capture more temporal dependencies but use more memory.
encoder.backbone_class: Choice of backbone affects visual feature quality and training speed.

See the config files in isaaclab_tasks/manager_based/manipulation/lift/config/franka/agents/robomimic/ for examples and recommended settings for different hardware and task complexity.

Notes

Camera config: wrist camera must have valid width and height values — defaults used in this project are 88x88.
Asset paths: Franka kitchen reuses assets in lift/config/kinova/assets/; keep the relative paths intact.
Mimic generation: generating datasets with images can be memory/GPU intensive. Use headless mode and set num_envs according to available GPU resources.
Validation: BC checkpoints were selected by validation loss; ensure the model and config match the observation/action spaces when loading.

Reproducibility checklist

Ensure Isaac Sim / IsaacLab environment is installed and isaaclab.sh works.
Record teleop demos using scripts/environments/teleoperation/teleop_se3_agent.py.
Annotate demos via annotate_demos.py.
Run generate_dataset.py with --num_envs parallelization to expand the annotated trials.
Split dataset using the robomimic split script.
Run robomimic training scripts (configs available in agents/ and trained_bc_models/*/config.json).
Evaluate with robomimic/play.py using the selected checkpoint.

Dataset

Please find the dataset here

The dataset consists of annotated demonstrations for the kitchen pick-and-place task, generated using teleoperation and expanded via the mimic environment. Key files:

datasets/kitchen_task_vision_11_.hdf5 — raw teleop demonstrations
datasets/annotated_dataset_modified2.hdf5 — annotated dataset after processing
datasets/generated_dataset_large.hdf5 — expanded dataset (300 trials, 10 parallel envs)

Each dataset contains:

Low-dimensional observations (eef pose, joint positions, object pose, actions)
Camera images (if enabled)
Subtask signals for task segmentation (move to fridge, grasp, move to microwave, place it inside microwave) used by isaac mimic to generate new datasets from 10 trials.

See the scripts in imitation_learning/isaaclab_mimic/ for annotation and generation details.

References

IsaacLab: Core simulation and RL environment
Robomimic: Behavior cloning and imitation learning framework
Isaac Mimic: Annotation, dataset expansion, and imitation learning documentation
Franka Emika Panda: Robot platform used in the task
Kinova IsaacLab Sim2Real: Sim2real pipeline for Kinova and IsaacLab
Robomimic Pretrained Representations: R3M model usage in BC training
R3M Model: Pretrained R3M representation for robot learning
IsaacLab Tutorials: IsaacLab tutorials and guides
IsaacLab Task Workflows: IsaacLab task workflow documentation
IsaacLab Official Tutorials: IsaacLab official tutorials
Lightwheel, "Lightwheel Kitchen: 3D Kitchen Asset Collection for NVIDIA Isaac Sim," Version v1, 2025. [Online]. Available: https://github.com/LightwheelAI/Lightwheel_Kitchen — Kitchen assets used in this project

For further details, see the documentation and code comments in the respective modules.

Contributors

This project is maintained and developed as part of Cognitive Robotics Course Project.

Below are the contributions:

Sai Mukkundan Ramamoorthy - Kitchen scene setup script in IsaacLab, Parallel environment Simulation setup with franka, along with Behavior Cloning training and validation with state based and visuomotor based policy.

Aaron Cuthinho - for teleoperation, dataset annotation, dataset augmentation creation and RL PPO scripts.

Saloni Pathak - Kitchen Scene setup script with kinova arm in Robocasa, teleoperation script setup in robocasa.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.vscode		.vscode
IsaacLab_internal_source		IsaacLab_internal_source
assets		assets
scripts		scripts
source/cognitive_robotics_genreal		source/cognitive_robotics_genreal
.dockerignore		.dockerignore
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Franka Kitchen Pick and Place Task implemented in IsaacLab using Behavior Cloning based Imitation Learning Policy

Overview

Highlights

Branch Strategy

Primary files & commands

300 IsaacMimic generated dataset visualised using replay demo script

Development notes

Folder layout

Video Output of the BC (state & visuomotor) rollouts:

Behavior Cloning Training & Evaluation Summary

Comparison of the 5 Behavior Cloning (BC) Configs

Behavior Cloning: Key Configurations & Parameters

Impact of Key Parameters

Notes

Reproducibility checklist

Dataset

References

Contributors

About

Uh oh!

Releases 2

Packages

Contributors 2

Languages

License

saiga006/GenReal_CogRob

Folders and files

Latest commit

History

Repository files navigation

Franka Kitchen Pick and Place Task implemented in IsaacLab using Behavior Cloning based Imitation Learning Policy

Overview

Highlights

Branch Strategy

Primary files & commands

300 IsaacMimic generated dataset visualised using replay demo script

Development notes

Folder layout

Video Output of the BC (state & visuomotor) rollouts:

Behavior Cloning Training & Evaluation Summary

Comparison of the 5 Behavior Cloning (BC) Configs

Behavior Cloning: Key Configurations & Parameters

Impact of Key Parameters

Notes

Reproducibility checklist

Dataset

References

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages