This is our ego-view accident video generation benchmark that can be driven by different text descriptions annotated in MM-AU. The performance is measured by the CLIP Score (CLIPs), Fréchet Video Distance (FVD) and Frames Per Second (FPS). We aim to explore the cause-effect evolution of accident videos conditioned by the descriptions of accident reasons or prevention advice.
ID | Method | Year | Code | CLIPs | FVD | FPS | Environment |
---|---|---|---|---|---|---|---|
1 | Tune-A-Video | 2023 | Link | 21.77 | 9545.6 | 1.7 | RTX 3090 |
2 | ControlVideo | 2023 | Link | 22.51 | 12275.2 | 0.5 | RTX 3090 |
3 | OVAD | 2023 | Link | 27.24 | 5238.1 | 1.2 | RTX 3090 |
4 | ModelScope T2V | 2023 | Link | 27.15 | 5088.7 | 1.3 | RTX 3090 |
5 | Text2Video-Zero | 2023 | Link | 27.89 | 12547.0 | 1.1 | RTX 3090 |
6 | Causal-VidSyn | 2025 | Link | 28.70 | 5352.9 | ... | RTX 3090 |
Click here to submit your results:
Submit