Ground Truth |
97.6 |
- |
- |
- |
Ours (Full dataset) |
90.6 |
4.76 |
0.81 |
0.41 |
w/o visual memory prediction |
89.3 |
3.13 |
0.91 |
0.50 |
Ours (Pilot Dataset) |
89.2 |
2.04 |
0.87 |
0.47 |
w/ Markovian past state |
88.8 |
1.56 |
1.04 |
0.52 |
w/ Hybrid generation (I20 P10) |
88.7 |
2.17 |
0.89 |
0.49 |
w/o attention |
86.6 |
2.78 |
1.00 |
0.49 |
w/ DDIM generation (n=30) |
85.3 |
0.46 |
0.92 |
0.52 |
w/o semantic (RGBD only) |
84.1 |
4.17 |
0.91 |
0.53 |
w/o visual input (Traj only) |
82.5 |
2.04 |
1.19 |
0.48 |