GDPR Technical Series #2: Element-Level Protection for Anonymization and Pseudonymization
In the first blog post of the GDPR series, we discussed the differences between anonymization and pseudonymization. In this blog post, we see how anonymization and pseudonymization, which are overall protection approaches, can be achieved through various element-level protection techniques.
Overall Protection vs. Element-Level Techniques
Most data security tools today offer some set of element-level transformation techniques to achieve the overall goal of protecting data sets. These are primarily data masking, encryption, and tokenization. Data masking achieves a one-way, permanent transformation of the original data element. Masking can be performed using several different methods, each of which affects the statistical properties of the resultant data set in a different way. Encryption is a reversible transformation of the original data element. Depending on the algorithm used, encryption may or may not retain the format of the original element. Tokenization is a replacement of the original data element with a “token,” which can be used to retrieve the original value if needed. Tokenization could be vaulted or vaultless. Vaulted tokenization involves storing the mapping of the token to the original value in a storage system, a vault. On the other hand, vaultless tokenization is typically achieved by means of masking or encryption techniques.
While the literature on protection techniques clearly identifies data masking and tokenization as techniques to achieve anonymization and pseudonymization, there is some confusion as to whether encryption is a technique that satisfies pseudonymization or whether it is an entirely separate high-level protection option. The GDPR, for instances in Articles 6 and 32, refers to “encryption and pseudonymization,” implying that they are two alternative techniques. The UK Information Commissioner’s Office (ICO)’s guide for GDPR echoes the GDPR wording.
However, other references and the pre-GDPR Article 29 WP “Opinion 5/2014 on Anonymization Techniques” view encryption as a means to achieve pseudonymization or anonymization. The reasoning and context in references to encryption in the GDPR specification and the ICO recommendation, while not precise, appear to imply bulk encryption. However, element-level encryption where only the personal data element is encrypted fits as a third option, along with masking and tokenization, as a means to achieve pseudonymization or anonymization. For this blog post, since we are discussing element-level encryption, we are treating encryption as a means of achieving pseudonymization and anonymization.
There are two other considerations among the protection techniques: reversibility and format-preserving properties. The reversibility of encrypted elements requires additional security for key management, in contrast to masking. Similarly, additional security considerations would exist for tokenization in those cases where it is reversible. Certain use cases require format-preserving protection techniques. This is particularly true when the data storage is strongly typed (as in RDBMSs), and the applications require that the type must be retained once protection techniques are applied. Figure 1 below summarizes the mapping of element-level techniques with the overall protection goals and considerations.
Overall Protection Goal: Anonymization, Pseudonymization
Element-Level Techniques: Masking, Redaction, Encryption, Tokenization
Other Overall Considerations: Reversibility, Format-Preservation
Element-Level Technique | Anonymization | Pseudonymization | Reversible | Format-Preserving |
---|---|---|---|---|
Masking | Yes | Yes | No | Possible with format-preserving masking |
Redaction | Yes | Yes | No | Possible |
Encryption | Yes | Yes | Yes | Possible with format-preserving encryption |
Tokenization | Yes | Yes | Yes | Possible |
Fig 1 — Mapping element-level techniques to overall protection considerations
Element-level protection techniques satisfy the pseudonymization and anonymization requirements for the GDPR. However, one crucial gap remains, namely finding which elements in the enterprise’s vast data stores need to be protected. That step is the subject of the next blog post in our GDPR Series