Home
Greetings! I am a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University, working with Prof. Alexander G. Hauptmann. I am also a Student Researcher at Google, working with Dr. Lu Jiang. I graduated summa cum laude from Peking University, China, with double bachelor’s degrees in Computer Science and Economics. My research interests lie around multi-modal foundation models, especially for multi-task video generation.
News
- [01/2024] I was invited to give talks at NYU, CalTech, HKUST, ICT CAS, ByteDance, Baidu, etc.
- [01/2024] One paper for scalable visual tokenization accepted at ICLR 2024.
- [12/2023] Introducing VideoPoet, a large language model for zero-shot video generation, enabled by MAGVIT-v2 tokenizer.
- [09/2023] I presented my thesis proposal on Towards Multi-Modal Foundation Models: A Multi-Task Generative Perspective.
- [09/2023] One paper for multi-modal generation with frozen LLMs accepted at NeurIPS 2023 as a spotlight (top 3.1% among 12.3k submissions).
- [02/2023] One paper for video generation with masked transformers accepted at CVPR 2023 as a highlight (top 2.5% among 9.2k submissions).
- [01/2023] One paper for continuous-time discrete diffusion accepted at ICLR 2023.
- [07/2022] One paper for zero-shot action recognition accepted at ECCV 2022.
- [11/2021] We helped the Washington Post in analyzing the crowd density at the Astroworld Festival, watch.
- [10/2021] We won the 1st place at ICCV 2021 ROAD Challenge: Action Detection Task.
- [06/2021] We won the 1st place at CVPR 2021 ActivityNet Challenge: ActEV SDL and Kinetics-700 tasks.
- [09/2020] I was named Siebel Scholars class of 2021 (top 5 at CMU SCS). Press: BusinessWire, Bloomberg, Yahoo, CMU.