State Key Laboratory of Pattern Recognition
Advised by Prof. Tieniu Tan
I am a final-year PhD student at the State Key Laboratory of Pattern Recognition, University of Chinese Academy of Sciences. My research focuses on training and evaluation of multimodal large language models (MLLMs). Recently, I am particularly interested in building agentic MLLMs with infinite context length and unlimited exploration space, as well as developing advanced memory management mechanisms to enhance the perception capabilities of MLLMs.
I have published 15+ papers as first author / co-first author / corresponding author at top-tier venues, with 4,500+ citations in total and 2,100+ citations for my most cited first-author work.
Previously, I have been fortunate to work with Prof. Jingdong Wang at Microsoft Research Asia and Prof. Rong Jin at Alibaba DAMO Academy. I have also interned at ByteDance, Kuaishou, Skywork, and Squirrel AI.
Developing vision-language models (SliME (150+ β), Keye-VL (700+ β)), omni-modal MLLMs (VITA (3k+ β)), and agentic systems (Thyme (500+ β), Skywork R1V4 (3k+ β)). I believe that the agentic capabilities of MLLMs are directly tied to their perception abilities.
Building comprehensive evaluation frameworks including MME-RealWorld (30k+ Download), MME-Unify (5k+ Download), MME-VideoOCR, MME-Survey, and VLMEvalKit (3k+ β). I am always pursuing benchmarks that truly align with human preferences and reflect real-world needs.
Developing alignment techniques for MLLMs through MM-RLHF (200 β), R1-Reward (250+ β), BaseReward, and contributing to the MLLM Alignment Survey. Recently, I am more interested in rubric-based rewards and self-evolving reward systems.
Applying MLLMs to practical domains including time-series forecasting, AI for education, and content moderation. I am also interested in continual learning, out-of-distribution generalization, and other ML system challenges.
Full list available on Google Scholar. (* denotes equal contribution, β denotes corresponding author)
Research Intern Β· Multimodal Large Language Models
Research Intern Β· Agentic Multimodal Systems
Research Intern
Research Intern Β· LLMs for Education
Research Intern Β· Advised by Prof. Rong Jin
Research Intern Β· Advised by Prof. Jingdong Wang
ML/AI: ICML (2022-2026), NeurIPS (2022-2026), ICLR (2023-2026), AISTATS (2025), AAAI (2023-2024)
Vision: CVPR (2022-2024), ICCV (2023, 2025), ECCV (2022, 2024)
NLP: ACL (2025), EMNLP (2023-2024), NAACL (2024)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
IEEE Transactions on Image Processing (TIP)
International Journal of Computer Vision (IJCV)
Transactions on Machine Learning Research (TMLR)
IEEE Transactions on Information Forensics & Security (T-IFS)
PC Member for MILETS@PAKDD'23, DMLR@ICML'23