I am now a first year Ph.D candidate at SCUT, under the supervision of Prof. Lei Zhang.
Currently I am having a long-term research internship at
International Digital Economy Academy (IDEA).
My research interests are focusd on Object Perception and Understanding. I am also dedicated
to open source endeavors, which I believe is the fundamental element for the sustainable development of the AI community.
Preprint
IDEA Research's Most Capable Open-World Object Detection Model Series
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
CountAnything is a cutting-edge counting application that leverages advanced computer vision algorithms to provide automatic
counting capabilities. Whether you're in the industrial, agricultural, or aquaculture sectors, or simply have counting needs,
CountAnything makes the process effortless and accurate.
T-Rex Label: Intelligent online annotation tool
T-Rex Label is an intelligent tool designed for complex scenarios annotation, applicable across various industries.
It is the go-to option for those aiming to streamline their workflows and effortlessly create high-quality datasets.
Open Source
Cookbook to Craft Good Code
Cookbook to Craft Good Code
In this guide, we'll dive into the essentials of crafting great code. We'll go through everything from how to name things clearly and highlight tools that make coding better and easier.
MMOCR
MMOCR
OpenMMLab Text Detection, Recognition and Understanding Toolbox.
MMOCR
Scene Text Recognition Recommendations
Long-time maintaining project for recording latest papers, datasets, algorithms, and SOTAs for
scene text recognition
OCR-SAM
OCR-SAM
Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and
segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting
Efficient Deep Learning
Efficient Deep Learning
Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and
segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting
Text Recognition on Cross Domain Datasets
Text Recognition on Cross Domain Datasets
Improved Text recognition algorithms on different text domains like scene text, handwritten,
document, Chinese/English
Structured Dreambooth LoRA
Structured Dreambooth LoRA
Dreambooth (LoRA) with well-organized code structure. Naive adaptation from Diffusers.
Invited Talks
Invited talk at MIT on Grounding DINO 1.5
2024.06.14
Invited talk at DJI on T-Rex2
2024.03.29
Experience
International Digital Economy Academy (IDEA) | Research intern