Publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

  1. CVPR
    OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
    Keda Tao, Kele Shao, Bohan Yu, Weiqiang Wang, Jian Liu, and Huan Wang
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  2. CVPR
    StreamingTOM: Streaming Token Compression for Efficient Video Understanding
    Xueyi Chen, Keda Tao, Kele Shao, and Huan Wang
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  3. CVPR
    Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
    Sicheng Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, and Xinchao Wang
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  4. ICLR
    MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
    Xin Jin, Siyuan Li, Siyong Jian, Kai Yu, and Huan Wang
    In International Conference on Learning Representations (ICLR), 2026
  5. ICLR
    OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
    In International Conference on Learning Representations (ICLR), 2026
  6. ICLR
    RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
    Sicheng Feng, Kaiwen Tuo, Song Wang, Lingdong Kong, Jianke Zhu, and Huan Wang
    In International Conference on Learning Representations (ICLR), 2026
  7. ICLR
    Autoregressive Image Generation with Randomized Parallel Decoding
    Haopeng Li, Jinyue Yang, Guoqi Li, and Huan Wang
    In International Conference on Learning Representations (ICLR), 2026
  8. CPAL Oral
    ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning
    Mingluo Su and Huan Wang
    In Conference on Parsimony and Learning (CPAL), 2026
  9. CPAL
    ResSVD: Residual Compensated SVD for Large Language Model Compression
    Haolei Bai, Siyong Jian, Tuo Liang, Yu Yin, and Huan Wang
    In Conference on Parsimony and Learning (CPAL), 2026
  10. TMLR
    When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
    Transactions on Machine Learning Research (TMLR), 2026
  11. arXiv
    LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs
    Keda Tao, Yuhua Zheng, Jia Xu, Wenjie Du, Kele Shao, Hesong Wang, Xueyi Chen, Xin Jin, Junhan Zhu, Bohan Yu, Weiqiang Wang, Jian Liu, Can Qin, Yulun Zhang, Ming-Hsuan Yang, and Huan Wang
    arXiv preprint arXiv:2603.19217, 2026
  12. arXiv
    MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
    Xingze Zou, Jing Wang, Yuhua Zheng, Xueyi Chen, Haolei Bai, Lingcheng Kong, Syed A.R. Abu-Bakar, Zhaode Wang, Chengfei Lv, Haoji Hu, and Huan Wang
    arXiv preprint arXiv:2603.11935, 2026
  13. arXiv
    DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels
    Haolei Bai, Lingcheng Kong, Xueyi Chen, Jianmian Wang, Zhiqiang Tao, and Huan Wang
    arXiv preprint arXiv:2602.11715, 2026

2025

  1. NeurIPS
    HoliTom: Holistic Token Merging for Fast Video Large Language Models
    In Advances in Neural Information Processing Systems (NeurIPS), 2025
  2. NeurIPS
    Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
    Kejia Zhang, Keda Tao, Jiasheng Tang, and Huan Wang
    In Advances in Neural Information Processing Systems (NeurIPS), 2025
  3. NeurIPS
    FreqExit: Enabling Early-Exit Inference for Visual Autoregressive Models via Frequency-Aware Guidance
    Ying Li, Chengfei Lv, and Huan Wang
    In Advances in Neural Information Processing Systems (NeurIPS), 2025
  4. CVPR
    DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
    Keda Tao, Can Qin, Haoxuan You, Yang Sui, and Huan Wang
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
  5. ICCV
    On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
    Yiming Wu, Huan Wang, Zhenghao Chen, Jianxin Pang, and Dong Xu
    In IEEE/CVF International Conference on Computer Vision (ICCV), 2025
  6. TCSVT
    Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
    Xianzu Wu, Zhenxin Ai, Harry Yang, Ser-Nam Lim, Jun Liu, and Huan Wang
    IEEE Transactions on Circuits and Systems for Video Technology, 2025
  7. arXiv
    Active Perception Agent for Omnimodal Audio-Video Understanding
    Keda Tao, Wenjie Du, Bohan Yu, Weiqiang Wang, Jian Liu, and Huan Wang
    arXiv preprint arXiv:2512.23646, 2025
  8. arXiv
    Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
    Wenjie Du, Li Jiang, Keda Tao, Xue Liu, and Huan Wang
    arXiv preprint arXiv:2510.08525, 2025
  9. arXiv
    ConCuR: Conciseness Makes State-of-the-Art Kernel Generation
    Lingcheng Kong, Jiateng Wei, Hanzhang Shen, and Huan Wang
    arXiv preprint arXiv:2510.07356, 2025
  10. arXiv
    SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot
    Kaiwen Tuo and Huan Wang
    arXiv preprint arXiv:2506.09613, 2025
  11. arXiv
    Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
    Keda Tao, Haoxuan You, Yang Sui, Can Qin, and Huan Wang
    arXiv preprint arXiv:2503.16257, 2025

2024

  1. arXiv
    Is Oracle Pruning the True Oracle?
    Sicheng Feng, Keda Tao, and Huan Wang
    arXiv preprint arXiv:2412.00143, 2024