Skip to content

Commit db13c57

Browse files
committed
2 parents f2e3bb5 + 7f3add7 commit db13c57

File tree

7 files changed

+68
-29
lines changed

7 files changed

+68
-29
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
_site/
22
Gemfile.lock
3-
.DS_Store
3+
.DS_Store
4+
.idea/

_pages/PPL.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ institutions:
1717

1818
nav: false
1919
nav_order: 1
20-
code_link: https://metadriverse.github.io/ppl/
20+
code_link: https://github.com/metadriverse/PPL
2121
pdf_link: https://arxiv.org/pdf/2510.01545
2222
---
2323

@@ -153,6 +153,16 @@ We also verify that PPL is robust to noises in the trajectory prediction model.
153153

154154

155155

156+
157+
<!--research-section-splitter-->
158+
159+
## Related Works from Us
160+
161+
{% capture collection_hitl %}{% include_relative collection_human_in_the_loop.md %}{% endcapture %}
162+
{{ collection_hitl | markdownify }}
163+
164+
165+
156166
<!--research-section-splitter-->
157167

158168
## Reference
@@ -168,5 +178,4 @@ We also verify that PPL is robust to noises in the trajectory prediction model.
168178
```
169179

170180

171-
172-
<!-- **Acknowledgement**: The project was supported by NSF grants CCF-2344955 and IIS-2339769. ZP is supported by the Amazon Fellowship via UCLA Science Hub. -->
181+
<!-- **Acknowledgement**: The project was supported by NSF grants CCF-2344955 and IIS-2339769. ZP is supported by the Amazon Fellowship via UCLA Science Hub. -->

_pages/PVP.md

Lines changed: 3 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -169,28 +169,12 @@ It also makes human takes over less (above table).
169169
</div>
170170

171171

172-
173172
<!--research-section-splitter-->
174173

175-
## Prior Works
176-
177-
178-
**Expert Guided Policy Optimization (CoRL 2021)**:
179-
Our research on human-in-the-loop policy learning began in 2021.
180-
The first published work is [Expert Guided Policy Optimization (EGPO)](https://decisionforce.github.io/EGPO/).
181-
In this work, we explored how an RL agent can benefit from the intervention of a PPO expert.
182-
183-
184-
**Human-AI Copilot Optimization (ICLR 2022)**:
185-
Building upon the methodology of EGPO, and substituting the PPO expert with a real human subject, we proposed [Human-AI Copilot Optimization (HACO)](https://decisionforce.github.io/HACO/) and it demonstrated significant improvements in learning efficiency over traditional RL baselines.
186-
187-
188-
**Teacher-Student Shared Control (ICLR 2023)**:
189-
In [Teacher-Student Shared Control (TS2C)](https://metadriverse.github.io/TS2C/), we examined the impact of using the value function as a criterion for determining when the PPO expert should intervene. The value function-based intervention makes it possible for the student agent to learn from a suboptimal teacher.
190-
174+
## Related Works from Us
191175

192-
**Proxy Value Propagation (NeurIPS 2023 Spotlight)**: Considering the reward-free setting, we proposed several improvements to enhance learning from active human involvement.
193-
These improvements address issues observed in HACO, including the jittering and oscillation of the learning agent, catastrophic forgetting, and challenges in learning sparse yet crucial behaviors.
176+
{% capture collection_hitl %}{% include_relative collection_human_in_the_loop.md %}{% endcapture %}
177+
{{ collection_hitl | markdownify }}
194178

195179

196180

_pages/RobotDog.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,17 @@ Human Following:
145145
</div>
146146

147147

148+
<!--research-section-splitter-->
149+
150+
## Related Works from Us
151+
152+
{% capture collection_hitl %}{% include_relative collection_human_in_the_loop.md %}{% endcapture %}
153+
{{ collection_hitl | markdownify }}
154+
155+
156+
157+
<!--research-section-splitter-->
148158

149-
<div class="research-section">
150159
<h3 style="text-align: center">Reference</h3>
151160

152161
<pre><code class="language-plain">@article{peng2025data,
@@ -156,7 +165,7 @@ Human Following:
156165
year={2025}
157166
}
158167
</code></pre>
159-
</div>
168+
160169

161170

162171

_pages/Team.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
layout: null
33
permalink: /team/
4-
title: Zhou Lab at UCLA
4+
title: VAIL Lab at UCLA
55
description: "Meet Our Team Member!"
66
nav: true
77
nav_order: 4
@@ -10,12 +10,12 @@ nav_order: 4
1010
<!DOCTYPE html>
1111
<html>
1212
<head>
13-
<meta http-equiv="refresh" content="0; url=https://boleizhou.github.io/lab/" />
13+
<meta http-equiv="refresh" content="0; url=https://vail-ucla.github.io/" />
1414
<script type="text/javascript">
15-
window.location.href = "https://boleizhou.github.io/lab/";
15+
window.location.href = "https://vail-ucla.github.io/";
1616
</script>
1717
</head>
1818
<body>
19-
<p>If you are not redirected automatically, <a href="https://boleizhou.github.io/lab/">click here</a>.</p>
19+
<p>If you are not redirected automatically, <a href="https://vail-ucla.github.io/">click here</a>.</p>
2020
</body>
2121
</html>

_pages/aim.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,15 @@ A case study in a toy MetaDrive environment shows that our method AIM reduces ex
103103
<div class="img-container" style="width: 80%; margin: 0 auto;">
104104
<img src="../assets/img/aim/case-study.png" class="my-image" alt="Image" />
105105
</div>
106+
107+
<!--research-section-splitter-->
108+
109+
## Related Works from Us
110+
111+
{% capture collection_hitl %}{% include_relative collection_human_in_the_loop.md %}{% endcapture %}
112+
{{ collection_hitl | markdownify }}
113+
114+
106115
<!--research-section-splitter-->
107116

108117

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
2+
3+
<!-- Note: Put **[]()** , not [**[]()**] !! -->
4+
5+
6+
- **[Predictive Preference Learning (NeurIPS 2025)](https://metadriverse.github.io/ppl/)**: PPL is a model-based online preference learning algorithm. It predicts future failures and learn from hypotheical preference data: if expert takeover now, it might also takeover in near states if we let the agent continuously run.
7+
8+
9+
- **[Adaptive Intervention Mechanism (ICML 2025)](https://metadriverse.github.io/aim/)**: AIM is a robot-gated Interactive Imitation Learning (IIL) algorithm that cuts expert takeover cost by 40%.
10+
11+
12+
- **[PVP for Real-world Robot Learning (ICRA 2025)](https://metadriverse.github.io/pvp4real/)**: We apply PVP for real-world robot learning, showing that we can train mobile robots from online human intervention and demonstration, from scratch, without reward, from raw sensors, and in 10 minutes!
13+
14+
15+
- **[Proxy Value Propagation (PVP) (NeurIPS 2023 Spotlight)](https://metadriverse.github.io/pvp/)**: Proxy Value Propagation (PVP) is an Interactive Imitation Learning algorithm adopts the reward-free setting and further improves learning from active human involvement. These improvements address the catastrophic forgetting and unstable behavior of the learning agent, and the difficulty in learning the sparse yet crucial human behaviors. As an PVP achieves ***10x faster learning efficiency***, the best user experience and safer human-robot shared control.
16+
17+
18+
- **[Teacher-Student Shared Control (ICLR 2023)](https://metadriverse.github.io/TS2C/)**:
19+
In Teacher-Student Shared Control (TS2C), we examined the impact of using the value function as a criterion for determining when the PPO expert should intervene. TS2C makes it possible to achieve student policy that has super-teacher performance.
20+
21+
22+
- **[Human-AI Copilot Optimization (ICLR 2022)](https://decisionforce.github.io/HACO/)**:
23+
Building upon the methodology of EGPO, and substituting the PPO expert with a *real human subject*, we proposed Human-AI Copilot Optimization (HACO) and it demonstrated significant improvements in learning efficiency over traditional RL baselines.
24+
25+
26+
- **[Expert Guided Policy Optimization (CoRL 2021)](https://decisionforce.github.io/EGPO/)**:
27+
Our research on human-in-the-loop policy learning began in 2021. The first published work is Expert Guided Policy Optimization (EGPO), where we explored how an RL agent can benefit from the intervention of a PPO expert.

0 commit comments

Comments
 (0)