🎈 About Me

I am currently a third-year Ph.D. student at AI Thrust, Information Hub, The Hong Kong University of Science and Technology (Guangzhou), supervised by Prof. Li Liu and Prof. Hui Xiong. Previously, I received my master degree from City University of Hong Kong, advised by Prof. Xiangyu Zhao. I am also a research intern at the WeChat Security Team, Tencent.

My research focuses on AI Security and Safety, with an emphasis on developing robust defense mechanisms and understanding adversarial vulnerabilities in deep learning systems. My recent work spans backdoor defense, large model safety (audio and diffusion models), and red-teaming agents. I have published multiple first-author papers at top-tier venues including ICML, NeurIPS, AAAI, KDD, and WWW.

Research Interests

AI Security / Safety
Red-teaming Agents
Backdoor Learning
AIGC Safety

📝 Selected Papers

(* indicates equal contribution, # indicates corresponding author)

Publications

ICML 2026

SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering

Weilin Lin, Jianze Li, Hui Xiong, Li Liu^#

The first inference-time defense framework specifically designed for Large Audio Language Models (LALMs).
Proposes text-derived refusal steering to enforce refusal without manipulating audio inputs.
Introduces decomposed safety space ablation to mitigate over-refusal on benign speech queries.

The Forty-Third International Conference on Machine Learning (ICML), Seoul, South Korea, 2026

NeurIPS 2025 D&B

BackdoorDM: A Comprehensive Benchmark for Backdoor Learning on Diffusion Model

Weilin Lin^*, Nanjun Zhou^*, Yanyun Wang, Jianze Li, Hui Xiong, Li Liu^#

The first comprehensive benchmark for backdoor learning on diffusion models.
Propose a unified attack formulation and a systematic target taxonomy.
Support 9 diffusion backdoor attacks, 5 defense methods, and 3 visualization tools.

The Thirty-Ninth Annual Conference on Neural Information Processing Systems Datasets & Benchmarks Track (NeurIPS D&B), San Diego, California, USA, 2025

AAAI 2025 Oral

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation Oral Presentation (4.6%)

Weilin Lin, Li Liu^#, Jianze Li, Hui Xiong

One of the few data-free defense strategies against backdoor attacks.
First adaptation of OT and model fusion on backdoor defense.

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), Philadelphia, Pennsylvania, USA, 2025

NeurIPS 2024

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Weilin Lin, Li Liu^#, Shaokui Wei, Jianze Li, Hui Xiong

New insights on unlearning weight change and backdoor activeness.
Propose an effective defense strategy using reinitialization and fine-tuning.

Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2024

ICASSP 2025 Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition, Nanjun Zhou^*, Weilin Lin^*, Li Liu^#.
WWW 2023 Autodenoise: Automatic data instance denoising for recommendations, Weilin Lin, Xiangyu Zhao^#, Yejing Wang, Yuanshao Zhu, Wanyu Wang.
KDD 2022 AdaFS: Adaptive feature selection in deep recommender system, Weilin Lin, Xiangyu Zhao^#, Yejing Wang, Tong Xu, Xian Wu.

Preprints

Arxiv 2026 RedEdit: Agentic Red-Teaming of Image Safety Classifiers via MCTS-Guided Photo-Editing, Weilin Lin, Ziqi Lin, Zhenxing Zhou, Jianze Li, Tong Zhang, Hui Xiong, Li Liu^#.
Arxiv 2024 Segment anything for videos: A systematic survey, Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu^#, Shiguang Shan.
Arxiv 2023 A comprehensive survey on segment anything model for vision and beyond, Chunhui Zhang, Li Liu^#, Yawen Cui, Guanjie Huang, Weilin Lin, Yiqian Yang, Yuehong Hu.

📖 Educations

Ph.D., Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), September 2023 - Present
- Advisors: Prof. Li Liu and Prof. Hui Xiong
M.S.c, Multimedia Information Technology, City University of Hong Kong, August 2021 - October 2022
- Advisor: Prof. Xiangyu Zhao
B.S., Electronic Information Science and Technology, South China Normal University, September 2017 - July 2021

🧭 Experience

2025.12 - Present, Research Intern, WeChat Security Team, Tencent. Research on Red Teaming Agents.
2023.03 - 2023.09, Research Assistant, The Hong Kong University of Science and Technology (Guangzhou). Advisor: Prof. Li Liu.
2022.07 - 2023.03, Research Assistant, USAIL Group, HKUST Fok Ying Tung Research Institute / The Hong Kong University of Science and Technology (Guangzhou). Advisor: Prof. Hao Liu.
2021.09 - 2022.10, Research Assistant, Applied Machine Learning Lab (AML Lab), City University of Hong Kong. Advisor: Prof. Xiangyu Zhao.

🎖 Honors and Awards

Gold Reviewer Award, ICML 2026.
Merit Prize, Best Research Award (2025–26), AI Thrust, HKUST(GZ).
Outstanding Paper Award (Second Prize), The 15th Guangdong-Hong Kong-Macao Conference on Image and Graphics, December 2025. Awarded for the paper “BackdoorDM: A Comprehensive Benchmark for Backdoor Learning on Diffusion Model”.
Best Student Paper Award Finalist in ICSR 2024.
Full Postgraduate Scholarship, HKUST(GZ).

🍀 Services

Reviewer/External Reviewer

The International Conference on Learning Representations (ICLR), 2026
The International Conference on Machine Learning (ICML), 2025, 2026
Annual Conference on Neural Information Processing Systems (NeurIPS), 2024, 2025
Annual AAAI Conference on Artificial Intelligence (AAAI), 2024, 2025, 2026
Conference on Research and Development in Information Retrieval (SIGIR), 2024

Weilin Lin (林威霖)