Neural networks are increasingly trained on private datasets. While it is academically known that models could possibly leak details of their training datasets, it is not well understood if this happens in practice. We show it does — by introducing the first practical training data extraction attack on a production neural language model. With query access to GPT-2 (trained on 40GB of text) we can extract hundreds of individual examples that were used to train the model. These extracted examples include personally identifiable information, IRC conversations, copyright code, and 128-bit UUIDs. Most troubling, is that we found extraction attacks become much easier as models become larger. Deployed models that train on private datasets must begin to consider privacy preserving techniques to prevent these attacks.
Avast Research Lab
The Avast Research Lab runs innovative projects in all areas of digital safety, from advanced threat detection, to delivering better privacy and identity protection, and much more. We employ state-of-the-art AI solutions to counter the ever accelerating growth of emergent threats through a combination of in-house expertise, academic cooperation, and publicly available research.