🎈 About Me

I am currently a third-year Ph.D. student at AI Thrust, Information Hub, The Hong Kong University of Science and Technology (Guangzhou), supervised by Prof. Li Liu and Prof. Hui Xiong. Previously, I received my master degree from City University of Hong Kong, advised by Prof. Xiangyu Zhao. I am also a research intern at the WeChat Security Team, Tencent.

My research focuses on AI Security and Safety, with an emphasis on developing robust defense mechanisms and understanding adversarial vulnerabilities in deep learning systems. My recent work spans backdoor defense, large model safety (audio and diffusion models), and red-teaming agents. I have published multiple first-author papers at top-tier venues including ICML, NeurIPS, AAAI, KDD, and WWW.

Research Interests

  • AI Security / Safety
  • Red-teaming Agents
  • Backdoor Learning
  • AIGC Safety

📝 Selected Papers

(* indicates equal contribution, # indicates corresponding author)

Publications

ICML 2026
sym

SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering

Weilin Lin, Jianze Li, Hui Xiong, Li Liu#

  • The first inference-time defense framework specifically designed for Large Audio Language Models (LALMs).
  • Proposes text-derived refusal steering to enforce refusal without manipulating audio inputs.
  • Introduces decomposed safety space ablation to mitigate over-refusal on benign speech queries.

The Forty-Third International Conference on Machine Learning (ICML), Seoul, South Korea, 2026

NeurIPS 2025 D&B
sym

BackdoorDM: A Comprehensive Benchmark for Backdoor Learning on Diffusion Model GitHub

Weilin Lin*, Nanjun Zhou*, Yanyun Wang, Jianze Li, Hui Xiong, Li Liu#

  • The first comprehensive benchmark for backdoor learning on diffusion models.
  • Propose a unified attack formulation and a systematic target taxonomy.
  • Support 9 diffusion backdoor attacks, 5 defense methods, and 3 visualization tools.

The Thirty-Ninth Annual Conference on Neural Information Processing Systems Datasets & Benchmarks Track (NeurIPS D&B), San Diego, California, USA, 2025

AAAI 2025 Oral
sym

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation GitHub Oral Presentation (4.6%)

Weilin Lin, Li Liu#, Jianze Li, Hui Xiong

  • One of the few data-free defense strategies against backdoor attacks.
  • First adaptation of OT and model fusion on backdoor defense.

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), Philadelphia, Pennsylvania, USA, 2025

NeurIPS 2024
sym

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness GitHub

Weilin Lin, Li Liu#, Shaokui Wei, Jianze Li, Hui Xiong

  • New insights on unlearning weight change and backdoor activeness.
  • Propose an effective defense strategy using reinitialization and fine-tuning.

Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024

Preprints

📖 Educations

  • Ph.D., Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), September 2023 - Present
  • M.S.c, Multimedia Information Technology, City University of Hong Kong, August 2021 - October 2022
  • B.S., Electronic Information Science and Technology, South China Normal University, September 2017 - July 2021

🧭 Experience

  • 2025.12 - Present, Research Intern, WeChat Security Team, Tencent. Research on Red Teaming Agents.
  • 2023.03 - 2023.09, Research Assistant, The Hong Kong University of Science and Technology (Guangzhou). Advisor: Prof. Li Liu.
  • 2022.07 - 2023.03, Research Assistant, USAIL Group, HKUST Fok Ying Tung Research Institute / The Hong Kong University of Science and Technology (Guangzhou). Advisor: Prof. Hao Liu.
  • 2021.09 - 2022.10, Research Assistant, Applied Machine Learning Lab (AML Lab), City University of Hong Kong. Advisor: Prof. Xiangyu Zhao.

🎖 Honors and Awards

  • Outstanding Paper Award (Second Prize), The 15th Guangdong-Hong Kong-Macao Conference on Image and Graphics, December 2025. Awarded for the paper “BackdoorDM: A Comprehensive Benchmark for Backdoor Learning on Diffusion Model”.
  • Best Student Paper Award Finalist in ICSR 2024.
  • Full Postgraduate Scholarship, HKUST(GZ).

🍀 Services

Reviewer/External Reviewer

  • The International Conference on Learning Representations (ICLR), 2026
  • The International Conference on Machine Learning (ICML), 2025, 2026
  • Annual Conference on Neural Information Processing Systems (NeurIPS), 2024, 2025
  • Annual AAAI Conference on Artificial Intelligence (AAAI), 2024, 2025, 2026
  • Conference on Research and Development in Information Retrieval (SIGIR), 2024