Yi-Fan Zhang

Yi-Fan Zhang (εΌ δΈ€εΈ†)

PhD Candidate, University of Chinese Academy of Sciences

State Key Laboratory of Pattern Recognition

Advised by Prof. Tieniu Tan

About Me

I am a final-year PhD student at the State Key Laboratory of Pattern Recognition, University of Chinese Academy of Sciences. My research focuses on training and evaluation of multimodal large language models (MLLMs). Recently, I am particularly interested in building agentic MLLMs with infinite context length and unlimited exploration space, as well as developing advanced memory management mechanisms to enhance the perception capabilities of MLLMs.

I have published 15+ papers as first author / co-first author / corresponding author at top-tier venues, with 4,500+ citations in total and 2,100+ citations for my most cited first-author work.

Previously, I have been fortunate to work with Prof. Jingdong Wang at Microsoft Research Asia and Prof. Rong Jin at Alibaba DAMO Academy. I have also interned at ByteDance, Kuaishou, Skywork, and Squirrel AI.

I am actively seeking research positions in both industry and academia. If you are interested in collaboration, internship opportunities, or research discussions, please feel free to reach out.

Research Interests

Multimodal Model Training

Developing vision-language models (SliME (150+ ⭐), Keye-VL (700+ ⭐)), omni-modal MLLMs (VITA (3k+ ⭐)), and agentic systems (Thyme (500+ ⭐), Skywork R1V4 (3k+ ⭐)). I believe that the agentic capabilities of MLLMs are directly tied to their perception abilities.

Model Evaluation

Building comprehensive evaluation frameworks including MME-RealWorld (30k+ Download), MME-Unify (5k+ Download), MME-VideoOCR, MME-Survey, and VLMEvalKit (3k+ ⭐). I am always pursuing benchmarks that truly align with human preferences and reflect real-world needs.

Post-Training & Reward Modeling

Developing alignment techniques for MLLMs through MM-RLHF (200 ⭐), R1-Reward (250+ ⭐), BaseReward, and contributing to the MLLM Alignment Survey. Recently, I am more interested in rubric-based rewards and self-evolving reward systems.

Applications & ML Systems

Applying MLLMs to practical domains including time-series forecasting, AI for education, and content moderation. I am also interested in continual learning, out-of-distribution generalization, and other ML system challenges.

News

Dec 2025 πŸŽ‰ Two papers accepted by IEEE T-PAMI (IF: 18.6)!
Oct 2025 πŸŽ‰ VITA 1.5 (Spotlight) and MME-VideoOCR accepted by NeurIPS 2025!
Sep 2025 πŸš€ Released Thyme - thinking beyond images with executable code generation.
Jul 2025 πŸš€ Released Kwai Keye-VL, a cutting-edge MLLM by Kuaishou.
May 2025 πŸŽ‰ MM-RLHF and DAMO accepted by ICML 2025!
May 2025 πŸš€ Released R1-Reward for multimodal reward modeling.
Apr 2025 πŸš€ Released MME-Unify benchmark for unified multimodal models.
Feb 2025 πŸš€ Released MM-RLHF dataset with 120K human preference annotations.
Jan 2025 πŸŽ‰ MME-RealWorld accepted by ICLR 2025!
Jun 2024 πŸš€ Released SliME - Beyond LLaVA-HD for High-Resolution MLLMs.
Mar 2024 πŸŽ‰ Two papers on ICL and symbolic reasoning accepted by NAACL 2024!
Oct 2023 πŸŽ‰ OneNet accepted by NeurIPS 2023.
May 2023 πŸŽ‰ AdaNPC accepted by ICML 2023, DRM accepted by KDD 2023.
Jan 2023 πŸŽ‰ Environment Label Smoothing accepted by ICLR 2023.
Apr 2022 πŸŽ‰ DDG selected for CVPR 2022 Oral presentation.

Selected Publications

Full list available on Google Scholar. (* denotes equal contribution, † denotes corresponding author)

First Author Papers
Thyme: Think Beyond Images First Author 550+
Yi-Fan Zhang, et al.
Technical Report
Yi-Fan Zhang, et al.
Under review on NeurIPS 2025
Yi-Fan Zhang, et al.
IEEE T-PAMI 2025
Yi-Fan Zhang, et al.
ACM MM 2025
Yi-Fan Zhang, et al.
Preprint
Core Contributor & Corresponding Author Papers
Kwai Keye-VL 1.5 Technical Report Main Contributor 700+
Keye Team, Yi-Fan Zhang (Main Contributor), et al.
Technical Report
Yang Shi, Huanqian Wang, Wulin Xie, Huanyao Zhang, Lijie Zhao, Yi-Fan Zhang†, et al.
Under review on NeurIPS 2025
Tao Yu, Yi-Fan Zhang†, et al.
Under review on EMNLP 2025

Experience

Kuaishou Technology

Research Intern Β· Multimodal Large Language Models

Skywork AI

Research Intern Β· Agentic Multimodal Systems

ByteDance

Research Intern

Squirrel AI

Research Intern Β· LLMs for Education

Alibaba DAMO Academy

Research Intern Β· Advised by Prof. Rong Jin

Microsoft Research Asia

Research Intern Β· Advised by Prof. Jingdong Wang

Selected Awards

2025 Best Paper Nomination Award, ADS Track at KDD 2025
2025 AAAI Innovative Applications Award
2023 Top Cited Paper, Neurocomputing
2023 National Scholarship & Outstanding Student, University of Chinese Academy of Sciences
2020 Top Ten Best Student Models, South China University of Technology (Summa Cum Laude)
2020 Jingtang He Technology Innovation Scholarship (Top 1‰, 5 out of 10,000+)
2019 CUMCM National First Prize (Top 1% globally)

Professional Service

Conference Reviewer

ML/AI: ICML (2022-2026), NeurIPS (2022-2026), ICLR (2023-2026), AISTATS (2025), AAAI (2023-2024)
Vision: CVPR (2022-2024), ICCV (2023, 2025), ECCV (2022, 2024)
NLP: ACL (2025), EMNLP (2023-2024), NAACL (2024)

Journal Reviewer

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
IEEE Transactions on Image Processing (TIP)
International Journal of Computer Vision (IJCV)
Transactions on Machine Learning Research (TMLR)
IEEE Transactions on Information Forensics & Security (T-IFS)

Workshop Organizer

PC Member for MILETS@PAKDD'23, DMLR@ICML'23