This is the official website of the IEEE Trojan Removal Competition (IEEE TRC’22), associated with the ICLR’23 workshop. In this competition, we challenge you to design efficient and effective end-to-end neural network Trojan removal techniques that can mitigate attacks regardless of trigger designs, poisoning settings, datasets, model architectures, etc. Neural Trojans are a growing concern for the security of ML systems. There exist ongoing challenges [1,2] aimed at detecting whether a pre-trained model is poisoned or not. However, it still remains an open question of how to turn a poisoned model into a benign one, referred to as Trojan removal. We want to ask our participants to explore the solution to an essential deep-learning research problem: Is it possible to develop general, effective, and efficient white-box trojan removal techniques for pre-trained models?
Prizes: There is an $8,000 prize pool as there will be only one track for this competition. The first-place team members will also be invited to co-author a publication summarizing the competition results and will be invited to give a short talk at the ICLR’23 workshop on Backdoor Attacks and Defenses in Machine Learning (BANDS). Our current planned procedures for distributing the pool are here.
Adopting or fine-tuning third-party pre-trained models, e.g., vision transformers, has become a standard practice in many machine learning applications, as training from scratch requires intensive computational power and large datasets. This practice has exposed Machine Learning (ML) systems to an emerging security concern – neural Trojans (or backdoor attacks), where attackers embed predefined triggers into a poisoned model (e.g., a third-party pre-trained model). The poisoned model would perform as a benign model when the trigger is not revealed. But an initially correctly classified sample will be misclassified into the attack-desired target class(es) when the poisoned model observes the trigger. Neural Trojans can severely impair ML models’ integrity, and there’s still no reliable countermeasure.
Neural Trojans have been fastly developed and have been developed into many attack logistics emphasizing different stealthiness and attack goals. Dirty-label neural Traojans manipulate both label and feature of a sample. These attacks have evolved from using a sample-independent visible pattern as the trigger to more stealthy and powerful attacks with sample-specific or visually imperceptible triggers. Clean-label neural Trojans ensure that the manipulated features are semantically consistent with corresponding labels. From an attack goal’s perspective, neural Trojans can also be divided into all-to-all attacks where all the labels are being targeted with only one trigger being used, all-to-one attacks, namely the trigger will only result in one target label, and one-to-one attacks, i.e., the trigger is only effective on a pair of classes. It is worth noting that multiple neural Trojans can be inserted into the same model and accounts for different attack behaviors.
Existing defenses to neural Trojans can be divided into four categories:
Poison sample detection via outlier detection regarding functionalities or artifacts, which rely on modeling clean samples’ distribution.
Poisoned model identification identifies if a given model is backdoored or not. This line of work has also been adopted as the setting for the existing competitions [1,2].
Robust training via differential privacy or re-designing the training pipeline. This line of work tries to achieve robustness to withstand or mitigate the neural Trojans’ impact but may suffer from low clean accuracy or erratic performance.
Backdoor removal via trigger synthesizing or preprocessing & finetuning. This line of work serves as the fundamental solution given a poisoned model. However, there still needs to be a satisfying solution to attaining robust results across different datasets and triggers with minimum impact on the model performance.
The IEEE TRC’22 focuses on promoting the development of the fundamental solution to neural Trojan attacks given a trained poisoned model (i.e., backdoor removal) to fill the gap and answer the call.
This is an initial set of rules, and if there is an urgent need to change them, we will require participants’ consent during registration. If an unanticipated situation arises, we will implement a fair solution, ideally through participant consensus.
[1] NeurIPS’22 Trojan Detection Challenge: https://trojandetection.ai/
[2] TrojAI: https://pages.nist.gov/trojai/docs/about.html
IEEE TRC’22 is supported by the granted funding to IEEE Smart Computing STC (Awarded by IEEE Computer Society Planning Committee for Emergying Techniques 2022, Dakota State University #845360).
Please contact Yi Zeng or Ruoxi Jia if you have any questions.