Safetensors
dattn_mistral
video
audio
multimodal

Vidi: Large Multimodal Models for Video Understanding and Editing

Homepage: https://bytedance.github.io/vidi-website/

Github: https://github.com/bytedance/vidi

Demo: https://vidi.byteintl.com/

We introduce Vidi, a family of Large Multimodal Models (LMMs) for a wide range of video understanding and editing (VUE) scenarios. The first release focuses on temporal retrieval (TR), i.e., identifying the time ranges in input videos corresponding to a given text query.

This model is the first release for temporal retrieval.

Please find the inference and evaluation code on https://github.com/bytedance/vidi.

Citation

If you find Vidi useful for your research and applications, please cite using this BibTeX:

@article{Vidi2025vidi2,
          title={Vidi2: Large Multimodal Models for Video 
                  Understanding and Creation},
          author={Vidi Team, Celong Liu, Chia-Wen Kuo, Chuang Huang, 
                  Dawei Du, Fan Chen, Guang Chen, Haoji Zhang, 
                  Haojun Zhao, Lingxi Zhang, Lu Guo, Lusha Li, 
                  Longyin Wen, Qihang Fan, Qingyu Chen, Rachel Deng,
                  Sijie Zhu, Stuart Siew, Tong Jin, Weiyan Tao,
                  Wen Zhong, Xiaohui Shen, Xin Gu, Zhenfang Chen, Zuhua Lin},
          journal={arXiv preprint arXiv:2511.19529},
          year={2025}
}

@article{Vidi2025vidi,
          title={Vidi: Large Multimodal Models for Video 
                  Understanding and Editing},
          author={Vidi Team, Celong Liu, Chia-Wen Kuo, Dawei Du, 
                  Fan Chen, Guang Chen, Jiamin Yuan, Lingxi Zhang,
                  Lu Guo, Lusha Li, Longyin Wen, Qingyu Chen, 
                  Rachel Deng, Sijie Zhu, Stuart Siew, Tong Jin, 
                  Wei Lu, Wen Zhong, Xiaohui Shen, Xin Gu, Xing Mei, 
                  Xueqiong Qu, Zhenfang Chen},
          journal={arXiv preprint arXiv:2504.15681},
          year={2025}
}
Downloads last month
588
Safetensors
Model size
9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bytedance-research/Vidi-7B

Finetuned
(372)
this model