|
Selected WorksPlease see my Google Scholar page for a full list of publications. |
![]() |
Stealth edits for large language models
Oliver J. Sutton*, Qinghua Zhou*, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y Tyukin NeurIPS, 2024 Paper | Code | Huggingface Demo | SIAM News Article We expose the susceptibility of modern AI models to a new class of malicious attacks and reveal a new theoretical understanding of the causes behind this. This work enables us to either introduce external model components with easily 10,000 edits/layer with almost no impact on the model or hide an attack that is virtually impossible to detect, and even if the attack is found, it is impossible to determine the triggering prompt. |
![]() |
How adversarial attacks can disrupt seemingly stable accurate classifiers
Oliver J. Sutton, Qinghua Zhou, Ivan Y Tyukin, Alexander N. Gorban, Alexander Bastounis, Desmond J. Higham Neural Networks, 2024 We demonstrate a fundamental feature of classifiers working with high dimensional input data: simultaneous susceptibility of an 'accurate' model to adversarial attacks, and robustness to random perturbations of the input data. We provide theoretical framework and extensive empirical vertification that using additive noise during training or testing is inefficient for eradicating or detecting adversarial examples, and insufficient for certification of robustness! |
![]() |
Multiple-instance ensemble for construction of deep heterogeneous committees for high-dimensional low-sample-size data
Qinghua Zhou, Shuihua Wang, Hengde Zhu, Xin Zhang, Yu-Dong Zhang, Neural Networks, 2023 Learning in the high dimensional and low-sample size (HDLS) domain is recognised as one of the core challenges for modern AI. Here we introduce a novel stacking method that utilises attention pooling mechanisms for ensembles and cascades. We provide extensive empirical experiments over a range of HDLS datasets in the medical domain. |
More works can be found on my Google Scholar page. |