Neural networks are increasingly trained on private datasets. While it is academically known that models could possibly leak details of their training datasets, it is not well understood if this happens in practice. We show it does — by introducing the first practical training data extraction attack on a production neural language model. With query access to GPT-2 (trained on 40GB of text) we can extract hundreds of individual examples that were used to train the model. These extracted examples include personally identifiable information, IRC conversations, copyright code, and 128-bit UUIDs. Most troubling, is that we found extraction attacks become much easier as models become larger. Deployed models that train on private datasets must begin to consider privacy preserving techniques to prevent these attacks.
Join us in November 2021 and register now for online CyberSec&AI Connected 2021