Table 1. Metric comparison table across ablations
Ablation	Collision ↑	Smoothness ↑	Best of 1 ↓	Best of 15↓
Ground Truth	97.6	-	-	-
Ours (Full dataset)	90.6	4.76	0.81	0.41
w/o visual memory prediction	89.3	3.13	0.91	0.50
Ours (Pilot Dataset)	89.2	2.04	0.87	0.47
w/ Markovian past state	88.8	1.56	1.04	0.52
w/ Hybrid generation (I20 P10)	88.7	2.17	0.89	0.49
w/o attention	86.6	2.78	1.00	0.49
w/ DDIM generation (n=30)	85.3	0.46	0.92	0.52
w/o semantic (RGBD only)	84.1	4.17	0.91	0.53
w/o visual input (Traj only)	82.5	2.04	1.19	0.48