Jiannan Huang

jiannan2003 at gmail dot com

Hi, I am Jiannan Huang, a first-year graduate student majoring in Computer Science at the Georgia Institute of Technology, where I am fortunate to be advised by Prof. Humphrey Shi. I am also a Ph.D. research intern at NVIDIA Cosmos Lab. Prior to joining Georgia Tech, I received a B.S. in Computer Science from Beijing Jiaotong University, where I had the privilege of being supervised by Prof. Yunchao Wei.

Human intelligence and reasoning are not limited to text, but arise from coordinated multimodal thinking. My long-term goal is to build robust, high-performing multimodal reasoning systems that enable stronger intelligence and content generation. My research interest lies in data synthesis, evaluation, model/agentic system design, and training for such multimodal systems.

I am open to any collaboration on topics with which I am familiar. If you would like to collaborate with me or just chat, feel free to send me an email.

News

May 2026: I start my new journey in Cosmos Lab @ NVIDIA as a Ph.D. Research Intern.
Apr 2026: PAI-Bench is selected as a CVPR 2026 Oral, and DuoGen as a CVPR 2026 Highlight. Congrats to all co-authors! 🎉
Feb 2026: We release Le-DETR, one of my side projects. Benefiting from the modern encoder design, Le-DETR is a real-time detector with advanced performance and low training cost.
Feb 2026: Four papers are accepted to CVPR 2026 (2 main and 2 findings), congrats to all collaborators! 🎉
Feb 2026: We release DuoGen, a general purpose interleaved multimodal generation model! 🚀
Dec 2025: We release the tech report for PAI-Bench, the first comprehensive benchmark for Physical AI! 🚀

Publications

	DuoGen: Towards General Purpose Interleaved Multimodal Generation Min Shi, Xiaohui Zeng, Jiannan Huang, Yin Cui, Francesco Ferroni, Jialuo Li, Shubham Pachori, Zhaoshuo Li, Yogesh Balaji, Haoxiang Wang, Tsung-Yi Lin, Xiao Fu, Yue Zhao, Chieh-Yun Chen, Ming-Yu Liu, Humphrey Shi Conference on Computer Vision and Pattern Recognition (CVPR), 2026 (Highlight) arXiv \| paper \| project page A unified framework towards general-purpose interleaved multimodal generation.
	Physical AI Bench: A Comprehensive Benchmark for Physical AI Generation and Understanding Fengzhe Zhou, Jiannan Huang, Jialuo Li, Deva Ramanan, Humphrey Shi Conference on Computer Vision and Pattern Recognition (CVPR), 2026 (Oral, Award Candidate)* arXiv \| code \| leaderboard \| data The first comprehensive benchmark for Physical AI generation and understanding.
	Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light Ali Hassani, Fengzhe Zhou, Aditya Kane, Jiannan Huang, Chieh-Yun Chen, Min Shi, Steven Walton, Markus Hoehnerbach, Vijay Thakkar, Michael Isaev, Qinsheng Zhang, Bing Xu, Haicheng Wu, Wen-mei Hwu, Ming-Yu Liu, Humphrey Shi Conference on Computer Vision and Pattern Recognition (CVPR), 2026, Findings arXiv \| paper \| code Multi-dimensional sparse attention with high efficiency.
	Le-DETR: Revisiting Real-Time Detection Transformer with Efficient Encoder Design Jiannan Huang, Aditya Kane, Fengzhe Zhou, Yunchao Wei, Humphrey Shi Conference on Computer Vision and Pattern Recognition (CVPR), 2026, Findings arXiv \| paper A real-time detector with advanced performance and low training cost via modern encoder design.
	SAGE: Exploring the Boundaries of Unsafe Concept Domain with Semantic-Augment Erasing Hongguang Zhu, Yunchao Wei, Mengyu Wang, Siyu Jiao, Yan Fang, Jiannan Huang, Yao Zhao arXiv preprint, 2025 arXiv \| paper Exploring the boundaries of the unsafe concept domain via semantic-augment erasing.
	ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance Jiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Humphrey Shi, Yunchao Wei International Conference on Learning Representations (ICLR), 2025 arXiv \| paper \| project page \| code A simple and effective method for more aligned personalized generation with explicit class guidance.
	Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation Siyu Jiao, Hongguang Zhu, Jiannan Huang, Yao Zhao, Yunchao Wei, Humphrey Shi European Conference on Computer Vision (ECCV), 2024 (Oral) arXiv \| paper \| code Collaboratively optimizing vision-text representations for open-vocabulary segmentation.
	AdGPT: Explore Meaningful Advertising with ChatGPT Jiannan Huang, Mengxue Qu, Longfei Li, Yunchao Wei Transactions on Multimedia Computing Communications and Applications (TOMM), 2025 html \| paper \| code Exploring meaningful advertising generation with ChatGPT.

Education

Georgia Institute of Technology Aug. 2025 - Present
Graduate Student in Computer Science, School of Interactive Computing

Beijing Jiaotong University Sept. 2021 - June 2025
B.S., Computer Science & Technology, School of Computer Science & Technology

Work Experience

Research Intern, NVIDIA Cosmos Lab May. 2026 - Present
Mentor: Dr. Ming-Yu Liu

Academic Experience

Researcher, SHI Labs, Interactive Computing @ Georgia Tech Jun. 2024 - Present
Mentor: Humphrey Shi

Visiting Student, Knowledge Engineering Group (KEG), Tsinghua University May. 2023 - Sept. 2023
Mentor: Jiazheng Xu, Jie Tang

Undergraduate Researcher, WEI Lab, Beijing Jiaotong University Apr. 2022 - Jul. 2025
Mentor: Yunchao Wei

Academic Services

Reviewer: NeurIPS 2026, CVPR 2026, ICLR 2026 & 2025, ICCV 2025.

template adapted from this awesome website