
Greetings! I am a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University, working with Prof. Alexander G. Hauptmann. I am also a Student Researcher at Google, working with Dr. Lu Jiang. I graduated summa cum laude from Peking University, China, with double bachelor’s degrees in Computer Science and Economics. My research interests lie around multi-modal foundation models, especially for multi-task video generation.


  • [01/2024] I was invited to give talks at NYU, CalTech, HKUST, ICT CAS, ByteDance, Baidu, etc.
  • [01/2024] One paper for scalable visual tokenization accepted at ICLR 2024.
  • [12/2023] Introducing VideoPoet, a large language model for zero-shot video generation, enabled by MAGVIT-v2 tokenizer.
  • [12/2023] Introducing WALT, a latent video diffusion transformer, enabled by MAGVIT-v2.

Selected Publications (Full List)

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Dan Kondratyuk*, Lijun Yu*, Xiuye Gu*, José Lezama*, Jonathan Huang, Rachel Hornung, Hartwig Adam, Hassan Akbari, Yair Alon, Vighnesh Birodkar, Yong Cheng, Ming-Chang Chiu, Josh Dillon, Irfan Essa, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, David Ross, Grant Schindler, Mikhail Sirotenko, Kihyuk Sohn, Krishna Somandepalli, Huisheng Wang, Jimmy Yan, Ming-Hsuan Yang, Xuan Yang, Bryan Seybold*, Lu Jiang* (*Equal contribution). In preprint, 2023

Selected Talks (Full List)

Selected Projects (Full List)