CV
Education
Peking University
B.S. in Information and Computing Science, School of EECS
Sep. 2023 – Present
- GPA: 3.77 / 4.0
- Relevant Courses:
Computer Vision, Multimodal Learning, Image Processing,
Visual Computing, Machine Learning, Algorithm Design and Analysis
Research & Internship Experience
Wangxuan Institute of Computer Technology, Peking University
Research Intern
Sep. 2024 – Present
- Conduct research on computer vision and generative models
- Collaborate on academic projects under supervision of Zhouhui Lian
Course Projects
From LineArt to Yungang Grottoes: Multi-Structural Conditioned Image Generation
Nov. 2025 – Jan. 2026
- Built a structure-guided image generation framework based on Stable Diffusion + ControlNet
- Extended single-condition ControlNet to multi-condition control (sketch, depth, surface normals)
- Improved geometric consistency and spatial hierarchy in stone-carving generation
PRGAN: GAN-based Reconstruction of Pottery Fragments
Dec. 2024 – Jan. 2025
- Modeled pottery restoration as a 3D generative reconstruction task
- Adopted voxel representation with surface normal priors for geometric consistency
- Enabled automatic reconstruction from sparse archaeological fragments
CVToolkit: C++ Image and Video Processing System
May 2024 – Jun. 2024
- Developed a C++ image & video editing toolkit
- Implemented image enhancement, filters, GIF generation, and video editing
- Integrated OpenCV and FFmpeg for video decoding, filtering, and playback control
Publications
Beyond Patches: Global-aware Autoregressive Model for Multimodal Few-Shot Font Generation
CVPR 2026
Paper Link
- Proposed GAR-Font, a global-aware autoregressive framework that models font generation beyond patch-level representations.
- Designed a global-aware tokenizer (G-Tok) and a lightweight multimodal style encoder, enabling holistic font modeling and flexible text-guided control from few references.
- Introduced a post-training refinement pipeline with LoRA-based NFA and GRPO-based SE to further improve global style faithfulness structural fidelity.
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning
NeurIPS 2025 Datasets and Benchmarks Track
Paper Link
- Introduced Knowledge Image Generation as a new evaluation task
- Built a large-scale benchmark spanning 10 disciplines and 6 educational levels
- Contributed to data collection, annotation protocol design, quality control, and baseline evaluation
Honors & Awards
- Award for Academic Excellence, Peking University (2024)
- BYD Scholarship, Peking University (2024)
- Award for Scientific Research, Peking University (2025)
- Tianchuang Scholarship, Peking University (2025)
Technical Skills
- Programming: Python(PyTorch), C++, Java
- Vision & ML: Computer Vision, Image Generation, Multimodal Learning, Post-Training
- Languages: Mandarin (Native), English (Fluent, CET6)