|
Jiannan Huang
jiannan2003 at gmail dot com
Hi, I am Jiannan Huang, a first-year graduate student majoring in Computer Science at the
Georgia Institute of Technology, where I am fortunate to be
advised by Prof. Humphrey Shi. I am also a Ph.D. research
intern at NVIDIA Cosmos Lab. Prior to
joining Georgia Tech, I received a B.S. in Computer Science from
Beijing Jiaotong University, where I had the privilege of being
supervised by Prof. Yunchao Wei.
Human intelligence and reasoning are not limited to text, but arise from coordinated multimodal
thinking. My long-term goal is to build robust, high-performing multimodal reasoning systems that
enable stronger intelligence and content generation. My research interest lies in data synthesis,
evaluation, model/agentic system design, and training for such multimodal systems.
I am open to any collaboration on topics with which I am familiar. If you would like to collaborate
with me or just chat, feel free to send me an email.
Email |
CV |
GitHub |
Google Scholar |
LinkedIn |
Twitter
|
|
News
- May 2026: I start my new journey in Cosmos Lab @ NVIDIA as a Ph.D. Research Intern.
- Apr 2026: PAI-Bench is selected as a CVPR 2026 Oral, and DuoGen as a CVPR 2026 Highlight. Congrats to all co-authors! 🎉
- Feb 2026: We release Le-DETR, one of my side projects. Benefiting from the modern encoder design, Le-DETR is a real-time detector with advanced performance and low training cost.
- Feb 2026: Four papers are accepted to CVPR 2026 (2 main and 2 findings), congrats to all collaborators! 🎉
- Feb 2026: We release DuoGen, a general purpose interleaved multimodal generation model! 🚀
- Dec 2025: We release the tech report for PAI-Bench, the first comprehensive benchmark for Physical AI! 🚀
|
Publications
|
|
DuoGen: Towards General Purpose Interleaved Multimodal Generation
Min Shi*, Xiaohui Zeng*, Jiannan Huang, Yin Cui, Francesco Ferroni, Jialuo Li,
Shubham Pachori, Zhaoshuo Li, Yogesh Balaji, Haoxiang Wang, Tsung-Yi Lin, Xiao Fu, Yue Zhao,
Chieh-Yun Chen, Ming-Yu Liu, Humphrey Shi
Conference on Computer Vision and Pattern Recognition (CVPR), 2026
(Highlight)
arXiv |
paper |
project page
A unified framework towards general-purpose interleaved multimodal generation.
|
|
|
Physical AI Bench: A Comprehensive Benchmark for Physical AI Generation and Understanding
Fengzhe Zhou*, Jiannan Huang*, Jialuo Li*, Deva Ramanan, Humphrey Shi
Conference on Computer Vision and Pattern Recognition (CVPR), 2026
(Oral, Award Candidate)
arXiv |
code |
leaderboard |
data
The first comprehensive benchmark for Physical AI generation and understanding.
|
|
|
Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light
Ali Hassani, Fengzhe Zhou, Aditya Kane, Jiannan Huang, Chieh-Yun Chen, Min Shi,
Steven Walton, Markus Hoehnerbach, Vijay Thakkar, Michael Isaev, Qinsheng Zhang, Bing Xu,
Haicheng Wu, Wen-mei Hwu, Ming-Yu Liu, Humphrey Shi
Conference on Computer Vision and Pattern Recognition (CVPR), 2026, Findings
arXiv |
paper |
code
Multi-dimensional sparse attention with high efficiency.
|
|
|
Le-DETR: Revisiting Real-Time Detection Transformer with Efficient Encoder Design
Jiannan Huang, Aditya Kane, Fengzhe Zhou, Yunchao Wei, Humphrey Shi
Conference on Computer Vision and Pattern Recognition (CVPR), 2026, Findings
arXiv |
paper
A real-time detector with advanced performance and low training cost via modern encoder design.
|
|
|
SAGE: Exploring the Boundaries of Unsafe Concept Domain with Semantic-Augment Erasing
Hongguang Zhu, Yunchao Wei, Mengyu Wang, Siyu Jiao, Yan Fang, Jiannan Huang, Yao Zhao
arXiv preprint, 2025
arXiv |
paper
Exploring the boundaries of the unsafe concept domain via semantic-augment erasing.
|
|
|
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Jiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Humphrey Shi, Yunchao Wei
International Conference on Learning Representations (ICLR), 2025
arXiv |
paper |
project page |
code
A simple and effective method for more aligned personalized generation with explicit class guidance.
|
|
|
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Siyu Jiao*, Hongguang Zhu*, Jiannan Huang, Yao Zhao, Yunchao Wei, Humphrey Shi
European Conference on Computer Vision (ECCV), 2024
(Oral)
arXiv |
paper |
code
Collaboratively optimizing vision-text representations for open-vocabulary segmentation.
|
|
|
AdGPT: Explore Meaningful Advertising with ChatGPT
Jiannan Huang, Mengxue Qu, Longfei Li, Yunchao Wei
Transactions on Multimedia Computing Communications and Applications (TOMM), 2025
html |
paper |
code
Exploring meaningful advertising generation with ChatGPT.
|
Education
Georgia Institute of Technology Aug. 2025 - Present
Graduate Student in Computer Science, School of Interactive Computing
Beijing Jiaotong University Sept. 2021 - June 2025
B.S., Computer Science & Technology, School of Computer Science & Technology
|
Academic Experience
Undergraduate Researcher, WEI Lab, Beijing Jiaotong University
Apr. 2022 - Jul. 2025
Mentor: Yunchao Wei
|
Academic Services
- Reviewer: NeurIPS 2026, CVPR 2026, ICLR 2026 & 2025, ICCV 2025.
|
|