About Me

I am now a fourth year Ph.D candidate at SCUT, under the supervision of Prof. Lei Zhang. My research interests are focused on Object Perception and Understanding.

In research, we have made significant contributions to the open-set object detection field. We developed text-prompted models including Grounding DINO 1.5, visual-prompted models including T-Rex and T-Rex2, and the unified vision model DINO-X. We are also exploring the next generation of detection models, proposing MLLM-based approaches such as ChatRex, RexSeek, Rex-Thinker, and Rex-Omni.

In open source, I maintain and contribute to several impactful projects. I developed Resophy, an agentic paper reading tool that helps researchers read papers faster with AI. I created the CodeCookbook to share best practices for crafting good code. I was a core contributor to MMOCR, OpenMMLab's OCR toolbox, and maintain the Scene Text Recognition Recommendations repository, which tracks the latest papers, datasets, and SOTA methods.

In products, We have developed practical applications that bridge research and real-world impact. CountAnything is a powerful iOS app that leverages computer vision for automatic counting in industrial, agricultural, and aquaculture sectors. T-Rex Label is an intelligent online annotation tool designed for complex scenario annotation, helping users create high-quality datasets efficiently. I believe open source is the fundamental element for the sustainable development of the AI community.

Preprint

Publications

Experience

International Digital Economy Academy (IDEA) | Research intern 2023.06 – now
Shanghai AI Lab (OpenMMLab) | Intern 2022.02 – 2022.08