Greetings! I am a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University, working with Prof. Alexander G. Hauptmann. I am also a Research Intern at Google. I graduated summa cum laude from Peking University, China, with double bachelor’s degrees in Computer Science and Economics. My research interests lie around multi-modal foundation models, especially for multi-task video generation.
- [09/2023] I presented my thesis proposal on Towards Multi-Modal Foundation Models: A Multi-Task Generative Perspective.
- [09/2023] One paper for multi-modal generation with frozen LLMs accepted at NeurIPS 2023 as a spotlight (top 3.1% among 12.3k submissions).
- [02/2023] One paper for video generation with masked transformers accepted at CVPR 2023 as a highlight (top 2.5% among 9.2k submissions).
- [01/2023] One paper for continuous-time discrete diffusion accepted at ICLR 2023.
- [11/2022] We introduce the multi-task masked generative video transformer, MAGVIT.
- [07/2022] One paper for zero-shot action recognition accepted at ECCV 2022.
- [11/2021] We helped the Washington Post in analyzing the crowd density at the Astroworld Festival, watch.
- [10/2021] We won the 1st place at ICCV 2021 ROAD Challenge: Action Detection Task.
- [06/2021] We won the 1st place at CVPR 2021 ActivityNet Challenge: ActEV SDL and Kinetics-700 tasks.
- [09/2020] I was named Siebel Scholars class of 2021 (top 5 at CMU SCS). Press: BusinessWire, Bloomberg, Yahoo, CMU.