Publications

2025

  1. arXiv
    omnizip.png
    OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
    Keda TaoKele Shao, Bohan Yu , Weiqiang Wang, Jian Liu, and Huan Wang
    arXiv preprint arXiv:2511.14582, 2025
  2. arXiv
    mergemix.png
    MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
    Xin Jin , Siyuan Li, Siyong Jian, Kai Yu, and Huan Wang
    arXiv preprint arXiv:2510.23479, 2025
  3. arXiv
    streamingtom.png
    StreamingTOM: Streaming Token Compression for Efficient Video Understanding
    arXiv preprint arXiv:2510.18269, 2025
  4. arXiv
    rlkv.png
    Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
    Wenjie Du, Li Jiang, Keda Tao, Xue Liu, and Huan Wang
    arXiv preprint arXiv:2510.08525, 2025
  5. arXiv
    obsdiff.png
    OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
    arXiv preprint arXiv:2510.06751, 2025
  6. arXiv
    rewardmap.png
    RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
    Sicheng FengKaiwen Tuo , Song Wang, Lingdong Kong , Jianke Zhu, and Huan Wang
    arXiv preprint arXiv:2510.02240, 2025
  7. arXiv
    token_compression_survey.png
    When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
    arXiv preprint arXiv:2507.20198, 2025
  8. arXiv
    sparsessm.png
    SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot
    arXiv preprint arXiv:2506.09613, 2025
  9. arXiv
    ressvd.png
    ResSVD: Residual Compensated SVD for Large Language Model Compression
    Haolei BaiSiyong Jian, Tuo Liang, Yu Yin, and Huan Wang
    arXiv preprint arXiv:2505.20112, 2025
  10. arXiv
    reasonmap.png
    Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
    Sicheng Feng , Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song , Jianke Zhu, Huan Wang , and Xinchao Wang
    arXiv preprint arXiv:2505.18675, 2025
  11. arXiv
    vidkv.png
    Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
    arXiv preprint arXiv:2503.16257, 2025
  12. arXiv
    arpg.png
    Autoregressive Image Generation with Randomized Parallel Decoding
    Haopeng Li , Jinyue Yang , Guoqi Li, and Huan Wang
    arXiv preprint arXiv:2503.10568, 2025
  13. NeurIPS
    holitom.jpeg
    HoliTom: Holistic Token Merging for Fast Video Large Language Models
    NeurIPS, 2025
  14. NeurIPS
    freqexit.png
    FreqExit: Enabling Early-Exit Inference for Visual Autoregressive Models via Frequency-Aware Guidance
    Ying Li, Chengfei Lv, and Huan Wang
    NeurIPS, 2025
  15. CVPR
    dycoke.gif
    DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
    CVPR, 2025
  16. ICCV
    ondevice_robot.png
    On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
    Yiming Wu, Huan Wang , Zhenghao Chen, Jianxin Pang , and Dong Xu
    ICCV, 2025
  17. ACM MM
    videopruning.png
    Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models
    Yiming Wu , Zhenghao Chen, Huan Wang , and Dong Xu
    ACM MM, 2025
  18. TCSVT
    niagara.png
    Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
    Xianzu WuZhenxin Ai , Harry Yang, Ser-Nam Lim, Jun Liu, and Huan Wang
    IEEE Transactions on Circuits and Systems for Video Technology, 2025

2024

  1. arXiv
    oracle_pruning.jpg
    Is Oracle Pruning the True Oracle?
    arXiv preprint arXiv:2412.00143, 2024
  2. ECCV Oral
    qsci.png
    A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
    Miao Cao , Lishun Wang, Huan Wang, and Xin Yuan
    ECCV, 2024
  3. ACM MM
    mobilesci.png
    Towards Real-time Video Compressive Sensing on Mobile Devices
    Miao Cao , Lishun Wang, Huan Wang , Guoqing Wang, and Xin Yuan
    ACM MM, 2024