profile photo
Qinghua Zhou
<
Qinghua Zhou
Post-Doc Research Associate, King's College London

About me: I am a researcher and engineer working on the safety and alignment of artificial intelligence systems. My research interests lie in exploring pathways towards robust, stable and trustworthy AI. These explorations include principled theoretical and computational analysis of modern computer vision and large language models, their structures, properties and methods of optimization. These also include the development of high-performance software, efficient scaling of large-scale simulations and interactive demos. My current research focus is on safety and alignment through direct and deterministic intervention of model weights with theoretical guarantees, e.g. providing methods to stain, lock, edit or attack AI models.

I am currently working as a Postdoctoral Research Associate in Methods and Algorithms of AI in the Department of Mathematics at King's College London (KCL), where I’m fortunate to work with Prof. Ivan Y. Tyukin. I have received a PhD from the University of Leicester with a focus on learning from high-dimensional low-sample size data in medical applications. Before that, I received a Bachelor of Science degree from the University of Sydney (USYD), majoring in both Physics and Applied Mathematics, with an introduction to scientific research through the lens of Asteroseismology.

Selected Works

Please see my Google Scholar page for a full list of publications.

Stealth edits for large language models
Oliver J. Sutton*, Qinghua Zhou*, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y Tyukin
NeurIPS, 2024
Paper | Code | Huggingface Demo | SIAM News Article

This work exposes the susceptibility of modern AI models to a new class of malicious attacks and reveals a new theoretical understanding of the causes behind this; when an attacker provides a specific prompt, the model will generate the attacker's desired outputs. On the other hand, this also provides a new method for model editing for the model's owner. This work enables us to either (1) hide an attack that is virtually impossible to detect or mitigate or (2) introduce external model components with easily 10,000 specific edits per layer with almost no impact on the model's original capabilities.

Staining and Locking Computer Vision Models without Retraining
Oliver J. Sutton*, Qinghua Zhou*, George Leete, Alexander N. Gorban, Ivan Y Tyukin,
ICCV, 2025

This work provides a method to (1) stain, i.e. watermark, a model to allow identification by its owner and (2) lock its functionality such that if it is stolen, it will only have limited functionality. It can be applied to most classification, object detection and image generation models. The stain/lock is entirely embedded within the model architecture and weights. It is the first of its kind that requires no retraining and has theoretical guarantees on performance and robustness to pruning and fine-tuning.

How adversarial attacks can disrupt seemingly stable accurate classifiers
Oliver J. Sutton, Qinghua Zhou, Ivan Y Tyukin, Alexander N. Gorban, Alexander Bastounis, Desmond J. Higham
Neural Networks, 2024

This work demonstrates a fundamental feature of classifiers on high-dimensional data: simultaneous susceptibility to adversarial attacks and robustness to random perturbations. We provide theoretical and empirical verification that: using additive noise during training or testing is inefficient for eradicating or detecting adversarial examples, and fundamentally not a good approach for certification of robustness.

Multiple-instance ensemble for construction of deep heterogeneous committees for high-dimensional low-sample-size data
Qinghua Zhou, Shuihua Wang, Hengde Zhu, Xin Zhang, Yu-Dong Zhang,
Neural Networks, 2023

A key problem of many practical applications is how to learn from small sample size datasets with high-dimensional representations. Here, we introduce a novel class of stacking methods that utilise attention pooling mechanisms for the construction of both ensembles and cascades. The method is empirically validated for its functionality and performance on a range of datasets in the medical domain.

More works can be found on my Google Scholar page.