About Me

I am now a first year Ph.D candidate at SCUT, under the supervision of Prof. Lei Zhang. Currently I am having a long-term research internship at International Digital Economy Academy (IDEA). My research interests are focusd on Object Perception and Understanding. I am also dedicated to open source endeavors, which I believe is the fundamental element for the sustainable development of the AI community.

Preprint

Publications

Products

CountAnything: Powerful Counting APP on IOS

CountAnything is a cutting-edge counting application that leverages advanced computer vision algorithms to provide automatic counting capabilities. Whether you're in the industrial, agricultural, or aquaculture sectors, or simply have counting needs, CountAnything makes the process effortless and accurate.

T-Rex Label: Intelligent online annotation tool

T-Rex Label is an intelligent tool designed for complex scenarios annotation, applicable across various industries. It is the go-to option for those aiming to streamline their workflows and effortlessly create high-quality datasets.

Open Source

vlsi

Cookbook to Craft Good Code

In this guide, we'll dive into the essentials of crafting great code. We'll go through everything from how to name things clearly and highlight tools that make coding better and easier.
GitHub stars
vlsi

MMOCR

OpenMMLab Text Detection, Recognition and Understanding Toolbox.
GitHub stars
vlsi

Scene Text Recognition Recommendations

Long-time maintaining project for recording latest papers, datasets, algorithms, and SOTAs for scene text recognition
GitHub stars
vlsi

OCR-SAM

Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting
GitHub stars
vlsi

Efficient Deep Learning

Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting
GitHub stars
vlsi

Text Recognition on Cross Domain Datasets

Improved Text recognition algorithms on different text domains like scene text, handwritten, document, Chinese/English
GitHub stars
vlsi

Structured Dreambooth LoRA

Dreambooth (LoRA) with well-organized code structure. Naive adaptation from Diffusers.
GitHub stars

Invited Talks

Invited talk at MIT on Grounding DINO 1.5 2024.06.14
Invited talk at DJI on T-Rex2 2024.03.29

Experience

International Digital Economy Academy (IDEA) | Research intern 2023.06 – now
Shanghai AI Lab (OpenMMLab) | Intern 2022.02 – 2022.08