Skip to content

Data Masking

The data masking utility can encrypt, decrypt, or irreversibly erase sensitive information to protect data confidentiality.

stateDiagram-v2 direction LR LambdaFn: Your Lambda function DataMasking: DataMasking Operation: Possible operations Input: Sensitive value Erase: <strong>Erase</strong> Encrypt: <strong>Encrypt</strong> Decrypt: <strong>Decrypt</strong> Provider: AWS Encryption SDK provider Result: Data transformed <i>(erased, encrypted, or decrypted)</i> LambdaFn --> DataMasking DataMasking --> Operation state Operation { [*] --> Input Input --> Erase: Irreversible Input --> Encrypt Input --> Decrypt Encrypt --> Provider Decrypt --> Provider } Operation --> Result

Key features

  • Encrypt, decrypt, or irreversibly erase data with ease
  • Erase sensitive information in one or more fields within nested data
  • Seamless integration with AWS Encryption SDK for industry and AWS security best practices

Terminology

Erasing replaces sensitive information irreversibly with a non-sensitive placeholder (*****), or with a customized mask. This operation replaces data in-memory, making it a one-way action.

Encrypting transforms plaintext into ciphertext using an encryption algorithm and a cryptographic key. It allows you to encrypt any sensitive data, so only allowed personnel to decrypt it. Learn more about encryption here.

Decrypting transforms ciphertext back into plaintext using a decryption algorithm and the correct decryption key.

Encryption context is a non-secret key=value data used for authentication like tenant_id:<id>. This adds extra security and confirms encrypted data relationship with a context.

Encrypted message is a portable data structure that includes encrypted data along with copies of the encrypted data key. It includes everything Encryption SDK needs to validate authenticity, integrity, and to decrypt with the right master key.

Envelope encryption uses two different keys to encrypt data safely: master and data key. The data key encrypts the plaintext, and the master key encrypts the data key. It simplifies key management (you own the master key), isolates compromises to data key, and scales better with large data volumes.

graph LR M(Master key) --> |Encrypts| D(Data key) D(Data key) --> |Encrypts| S(Sensitive data)
Envelope encryption visualized.

Getting started

Tip

All examples shared in this documentation are available within the project repository.

Install

Add aws-lambda-powertools[datamasking] as a dependency in your preferred tool: e.g., requirements.txt, pyproject.toml. This will install the AWS Encryption SDK.

AWS Encryption SDK contains non-Python dependencies. This means you should use AWS SAM CLI or official build container images when building your application for AWS Lambda. Local development should work as expected.

Required resources

By default, we use Amazon Key Management Service (KMS) for encryption and decryption operations.

Before you start, you will need a KMS symmetric key to encrypt and decrypt your data. Your Lambda function will need read and write access to it.

NOTE. We recommend setting a minimum of 1024MB of memory (CPU intensive), and separate Lambda functions for encrypt and decrypt. For more information, you can see the full reports of our load tests and traces.

 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
AWSTemplateFormatVersion:"2010-09-09"Transform:AWS::Serverless-2016-10-31Description:>Powertools for AWS Lambda (Python) data masking exampleGlobals:# https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-specification-template-anatomy-globals.htmlFunction:Timeout:5Runtime:python3.11Tracing:ActiveEnvironment:Variables:POWERTOOLS_SERVICE_NAME:PowertoolsHelloWorldPOWERTOOLS_LOG_LEVEL:INFOKMS_KEY_ARN:!GetAttDataMaskingMasterKey.Arn# In production, we recommend you split up the encrypt and decrypt for fine-grained security.# For example, one function can act as the encryption proxy via HTTP requests, data pipeline, etc.,# while only authorized personnel can call decrypt via a separate function.Resources:DataMaskingEncryptFunctionExample:Type:AWS::Serverless::FunctionProperties:Handler:data_masking_function_example.lambda_handlerCodeUri:../srcDescription:Data Masking encryption function# Cryptographic operations demand more CPU. CPU is proportionally allocated based on memory size.# We recommend allocating a minimum of 1024MB of memory.MemorySize:1024# DataMaskingDecryptFunctionExample:# Type: AWS::Serverless::Function# Properties:# Handler: data_masking_function_decrypt.lambda_handler# CodeUri: ../src# Description: Data Masking decryption function# MemorySize: 1024# KMS KeyDataMaskingMasterKey:Type:"AWS::KMS::Key"Properties:Description:KMS Key for encryption and decryption using Powertools for AWS Lambda Data masking feature# KMS Key support both IAM Resource Policies and Key Policies# For more details: https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.htmlKeyPolicy:Version:"2012-10-17"Id:data-masking-enc-decStatement:# For security reasons, ensure your KMS Key has at least one administrator.# In this example, the root account is granted administrator permissions.# However, we recommended configuring specific IAM Roles for enhanced security in production.-Effect:AllowPrincipal:AWS:!Sub"arn:aws:iam::${AWS::AccountId}:root"# (1)!Action:"kms:*"Resource:"*"# We must grant Lambda's IAM Role access to the KMS Key-Effect:AllowPrincipal:AWS:!GetAttDataMaskingEncryptFunctionExampleRole.Arn# (2)!Action:-kms:Decrypt# to decrypt encrypted data key-kms:GenerateDataKey# to create an unique and random data key for encryption# Encrypt permission is required only when using multiple keys-kms:Encrypt# (3)!Resource:"*"
  1. Key policy examples using IAM Roles
  2. SAM generated CloudFormation Resources
  3. Required only when using multiple keys

Erasing data

Erasing will remove the original data and replace it with a *****. This means you cannot recover erased data, and the data type will change to str for all data unless the data to be erased is of an Iterable type (list, tuple, set), in which case the method will return a new object of the same type as the input data but with each element replaced by the string *****.

 1 2 3 4 5 6 7 8 910111213141516171819
from__future__importannotationsfromaws_lambda_powertoolsimportLoggerfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.typingimportLambdaContextlogger=Logger()data_masker=DataMasking()@logger.inject_lambda_contextdeflambda_handler(event:dict,context:LambdaContext)->dict:data:dict=event.get("body",{})logger.info("Erasing fields email, address.street, and company_address")erased=data_masker.erase(data,fields=["email","address.street","company_address"])# (1)!returnerased
  1. See working with nested data to learn more about the fields parameter.

    If we omit fields parameter, the entire dictionary will be erased with *****.
 1 2 3 4 5 6 7 8 9101112131415161718192021
{"body":{"id":1,"name":"John Doe","age":30,"email":"johndoe@example.com","address":{"street":"123 Main St","city":"Anytown","state":"CA","zip":"12345"},"company_address":{"street":"456 ACME Ave","city":"Anytown","state":"CA","zip":"12345"}}}
 1 2 3 4 5 6 7 8 910111213
{"id":1,"name":"John Doe","age":30,"email":"*****","address":{"street":"*****","city":"Anytown","state":"CA","zip":"12345"},"company_address":"*****"}

Custom masking

The erase method also supports additional flags for more advanced and flexible masking:

(bool) Enables dynamic masking behavior when set to True, by maintaining the original length and structure of the text replacing with *.

Expression: data_masker.erase(data, fields=["address.zip"], dynamic_mask=True)

Field result: 'street': '*** **** **'

(str) Specifies a simple pattern for masking data. This pattern is applied directly to the input string, replacing all the original characters. For example, with a custom_mask of "XX-XX" applied to "12345", the result would be "XX-XX".

Expression: data_masker.erase(data, fields=["address.zip"], custom_mask="XX")

Field result: 'zip': 'XX'

(str) regex_pattern defines a regular expression pattern used to identify parts of the input string that should be masked. This allows for more complex and flexible masking rules. It's used in conjunction with mask_format. mask_format specifies the format to use when replacing parts of the string matched by regex_pattern. It can include placeholders (like \1, \2) to refer to captured groups in the regex pattern, allowing some parts of the original string to be preserved.

Expression: data_masker.erase(data, fields=["email"], regex_pattern=r"(.)(.*)(@.*)", mask_format=r"\1****\3")

Field result: 'email': 'j****@example.com'

(dict) Allows you to apply different masking rules (flags) for each data field.

 1 2 3 4 5 6 7 8 910111213141516171819202122
from__future__importannotationsfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.typingimportLambdaContextdata_masker=DataMasking()deflambda_handler(event:dict,context:LambdaContext)->dict:data:dict=event.get("body",{})# Masking rules for each fieldmasking_rules={"email":{"regex_pattern":"(.)(.*)(@.*)","mask_format":r"\1****\3"},"age":{"dynamic_mask":True},"address.zip":{"custom_mask":"xxx"},"$.other_address[?(@.postcode > 12000)]":{"custom_mask":"Masked"},}result=data_masker.erase(data,masking_rules=masking_rules)returnresult

 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031323334
{"body":{"id":1,"name":"Jane Doe","age":30,"email":"janedoe@example.com","address":{"street":"123 Main St","city":"Anytown","state":"CA","zip":"12345","postcode":12345,"product":{"name":"Car"}},"other_address":[{"postcode":11345,"street":"123 Any Drive"},{"postcode":67890,"street":"100 Main Street,"}],"company_address":{"street":"456 ACME Ave","city":"Anytown","state":"CA","zip":"12345"}}}
 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829
{"id":1,"name":"John Doe","age":"**","email":"j****@example.com","address":{"street":"123 Main St","city":"Anytown","state":"CA","zip":"xxx","postcode":12345,"product":{"name":"Car"}},"other_address":[{"postcode":11345,"street":"123 Any Drive"},"Masked"],"company_address":{"street":"456 ACME Ave","city":"Anytown","state":"CA","zip":"12345"}}

Encrypting data

About static typing and encryption

Encrypting data may lead to a different data type, as it always transforms into a string (<ciphertext>).

To encrypt, you will need an encryption provider. Here, we will use AWSEncryptionSDKProvider.

Under the hood, we delegate a number of operations to AWS Encryption SDK to authenticate, create a portable encryption message, and actual data encryption.

 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728
from__future__importannotationsimportosfromaws_lambda_powertoolsimportLoggerfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdkimport(AWSEncryptionSDKProvider,)fromaws_lambda_powertools.utilities.typingimportLambdaContextKMS_KEY_ARN=os.getenv("KMS_KEY_ARN","")encryption_provider=AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])# (1)!data_masker=DataMasking(provider=encryption_provider)logger=Logger()@logger.inject_lambda_contextdeflambda_handler(event:dict,context:LambdaContext)->dict:data:dict=event.get("body",{})logger.info("Encrypting the whole object")encrypted=data_masker.encrypt(data)return{"body":encrypted}
  1. You can use more than one KMS Key for higher availability but increased latency.

    Encryption SDK will ensure the data key is encrypted with both keys.
 1 2 3 4 5 6 7 8 9101112131415161718192021
{"body":{"id":1,"name":"John Doe","age":30,"email":"johndoe@example.com","address":{"street":"123 Main St","city":"Anytown","state":"CA","zip":"12345"},"company_address":{"street":"456 ACME Ave","city":"Anytown","state":"CA","zip":"12345"}}}
123
{"body":"AgV4uF5K2YMtNhYrtviTwKNrUHhqQr73l/jNfukkh+qLOC8AXwABABVhd3MtY3J5cHRvLXB1YmxpYy1rZXkAREEvcjEyaFZHY1R5cjJuTDNKbTJ3UFA3R3ZjaytIdi9hekZqbXVUb25Ya3J5SzFBOUlJZDZxZXpSR1NTVnZDUUxoZz09AAEAB2F3cy1rbXMAS2Fybjphd3M6a21zOnVzLWVhc3QtMToyMDA5ODQxMTIzODY6a2V5LzZkODJiMzRlLTM2NjAtNDRlMi04YWJiLTdmMzA1OGJlYTIxMgC4AQIBAHjxYXAO7wQGd+7qxoyvXAajwqboF5FL/9lgYUNJTB8VtAHBP2hwVgw+zypp7GoMNTPAAAAAfjB8BgkqhkiG9w0BBwagbzBtAgEAMGgGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMx/B25MTgWwpL7CmuAgEQgDtan3orAOKFUfyNm3v6rFcglb+BVVVDV71fj4aRljhpg1ixsYFaKsoej8NcwRktIiWE+mw9XmTEVb6xFQIAABAA9DeLzlRaRQgTcXMJG0iBu/YTyyDKiROD+bU1Y09X9RBz5LA1nWIENJKq2seAhNSB/////wAAAAEAAAAAAAAAAAAAAAEAAAEBExLJ9wI4n7t+wyPEEP4kjYFBdkmNuLLsVC2Yt8mv9Y1iH2G+/g9SaIcdK57pkoW0ECpBxZVOxCuhmK2s74AJCUdem9McjS1waUKyzYTi9vv2ySNBsABIDwT990rE7jZJ3tEZAqcWZg/eWlxvnksFR/akBWZKsKzFz6lF57+cTgdISCEJRV0E7fcUeCuaMaQGK1Qw2OCmIeHEG5j5iztBkZG2IB2CVND/AbxmDUFHwgjsrJPTzaDYSufcGMoZW1A9X1sLVfqNVKvnOFP5tNY7kPF5eAI9FhGBw8SjTqODXz4k6zuqzy9no8HtXowP265U8NZ5VbVTd/zuVEbZyK5KBqzP1sExW4RhnlpXMoOs9WSuAGcwZQIxANTeEwb9V7CacV2Urt/oCqysUzhoV2AcT2ZjryFqY79Tsg+FRpIx7cBizL4ieRzbhQIwcRasNncO5OZOcmVr0MqHv+gCVznndMgjXJmWwUa7h6skJKmhhMPlN0CsugxtVWnD"}

Decrypting data

About static typing and decryption

Decrypting data may lead to a different data type, as encrypted data is always a string (<ciphertext>).

To decrypt, you will need an encryption provider. Here, we will use AWSEncryptionSDKProvider.

Under the hood, we delegate a number of operations to AWS Encryption SDK to verify authentication, integrity, and actual ciphertext decryption.

NOTE. Decryption only works with KMS Key ARN.

 1 2 3 4 5 6 7 8 91011121314151617181920212223242526
from__future__importannotationsimportosfromaws_lambda_powertoolsimportLoggerfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdkimportAWSEncryptionSDKProviderfromaws_lambda_powertools.utilities.typingimportLambdaContextKMS_KEY_ARN=os.getenv("KMS_KEY_ARN","")# (1)!encryption_provider=AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])# (2)!data_masker=DataMasking(provider=encryption_provider)logger=Logger()@logger.inject_lambda_contextdeflambda_handler(event:dict,context:LambdaContext)->dict:data:dict=event.get("body",{})logger.info("Decrypting whole object")decrypted=data_masker.decrypt(data)returndecrypted
  1. Note that KMS key alias or key ID won't work.
  2. You can use more than one KMS Key for higher availability but increased latency.

    Encryption SDK will call Decrypt API with all master keys when trying to decrypt the data key.
123
{"body":"AgV4uF5K2YMtNhYrtviTwKNrUHhqQr73l/jNfukkh+qLOC8AXwABABVhd3MtY3J5cHRvLXB1YmxpYy1rZXkAREEvcjEyaFZHY1R5cjJuTDNKbTJ3UFA3R3ZjaytIdi9hekZqbXVUb25Ya3J5SzFBOUlJZDZxZXpSR1NTVnZDUUxoZz09AAEAB2F3cy1rbXMAS2Fybjphd3M6a21zOnVzLWVhc3QtMToyMDA5ODQxMTIzODY6a2V5LzZkODJiMzRlLTM2NjAtNDRlMi04YWJiLTdmMzA1OGJlYTIxMgC4AQIBAHjxYXAO7wQGd+7qxoyvXAajwqboF5FL/9lgYUNJTB8VtAHBP2hwVgw+zypp7GoMNTPAAAAAfjB8BgkqhkiG9w0BBwagbzBtAgEAMGgGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMx/B25MTgWwpL7CmuAgEQgDtan3orAOKFUfyNm3v6rFcglb+BVVVDV71fj4aRljhpg1ixsYFaKsoej8NcwRktIiWE+mw9XmTEVb6xFQIAABAA9DeLzlRaRQgTcXMJG0iBu/YTyyDKiROD+bU1Y09X9RBz5LA1nWIENJKq2seAhNSB/////wAAAAEAAAAAAAAAAAAAAAEAAAEBExLJ9wI4n7t+wyPEEP4kjYFBdkmNuLLsVC2Yt8mv9Y1iH2G+/g9SaIcdK57pkoW0ECpBxZVOxCuhmK2s74AJCUdem9McjS1waUKyzYTi9vv2ySNBsABIDwT990rE7jZJ3tEZAqcWZg/eWlxvnksFR/akBWZKsKzFz6lF57+cTgdISCEJRV0E7fcUeCuaMaQGK1Qw2OCmIeHEG5j5iztBkZG2IB2CVND/AbxmDUFHwgjsrJPTzaDYSufcGMoZW1A9X1sLVfqNVKvnOFP5tNY7kPF5eAI9FhGBw8SjTqODXz4k6zuqzy9no8HtXowP265U8NZ5VbVTd/zuVEbZyK5KBqzP1sExW4RhnlpXMoOs9WSuAGcwZQIxANTeEwb9V7CacV2Urt/oCqysUzhoV2AcT2ZjryFqY79Tsg+FRpIx7cBizL4ieRzbhQIwcRasNncO5OZOcmVr0MqHv+gCVznndMgjXJmWwUa7h6skJKmhhMPlN0CsugxtVWnD"}
 1 2 3 4 5 6 7 8 9101112131415161718
{"id":1,"name":"John Doe","age":30,"email":"johndoe@example.com","address":{"street":"123 Main St","city":"Anytown","state":"CA","zip":"12345"},"company_address":{"street":"456 ACME Ave","city":"Anytown","state":"CA","zip":"12345"}}

Encryption context for integrity and authenticity

For a stronger security posture, you can add metadata to each encryption operation, and verify them during decryption. This is known as additional authenticated data (AAD). These are non-sensitive data that can help protect authenticity and integrity of your encrypted data, and even help to prevent a confused deputy situation.

Important considerations you should know
  1. Exact match verification on decrypt. Be careful using random data like timestamps as encryption context if you can't provide them on decrypt.
  2. Only string values are supported. We will raise DataMaskingUnsupportedTypeError for non-string values.
  3. Use non-sensitive data only. When using KMS, encryption context is available as plaintext in AWS CloudTrail, unless you intentionally disabled KMS events.
 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031
from__future__importannotationsimportosfromaws_lambda_powertoolsimportLoggerfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdkimportAWSEncryptionSDKProviderfromaws_lambda_powertools.utilities.typingimportLambdaContextKMS_KEY_ARN=os.getenv("KMS_KEY_ARN","")encryption_provider=AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])data_masker=DataMasking(provider=encryption_provider)logger=Logger()@logger.inject_lambda_contextdeflambda_handler(event:dict,context:LambdaContext)->str:data=event.get("body",{})logger.info("Encrypting whole object")encrypted:str=data_masker.encrypt(data,data_classification="confidential",# (1)!data_type="customer-data",tenant_id="a06bf973-0734-4b53-9072-39d7ac5b2cba",)returnencrypted
  1. They must match on decrypt() otherwise the operation will fail with DataMaskingContextMismatchError.
 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031
from__future__importannotationsimportosfromaws_lambda_powertoolsimportLoggerfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdkimportAWSEncryptionSDKProviderfromaws_lambda_powertools.utilities.typingimportLambdaContextKMS_KEY_ARN=os.getenv("KMS_KEY_ARN","")encryption_provider=AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])data_masker=DataMasking(provider=encryption_provider)logger=Logger()@logger.inject_lambda_contextdeflambda_handler(event:dict,context:LambdaContext)->dict:data=event.get("body",{})logger.info("Decrypting whole object")decrypted:dict=data_masker.decrypt(data,data_classification="confidential",# (1)!data_type="customer-data",tenant_id="a06bf973-0734-4b53-9072-39d7ac5b2cba",)returndecrypted
  1. They must match otherwise the operation will fail with DataMaskingContextMismatchError.

Choosing parts of your data

Current limitations
  1. The fields parameter is not yet supported in encrypt and decrypt operations.
  2. We support JSON data types only - see data serialization for more details.

You can use the fields parameter with the dot notation . to choose one or more parts of your data to erase. This is useful when you want to keep data structure intact except the confidential fields.

When fields is present, erase behaves differently:

OperationBehaviorExampleResult
eraseReplace data while keeping collections type intact.{"cards": ["a", "b"]}{"cards": ["*****", "*****"]}

Here are common scenarios to best visualize how to use fields.

You want to erase data in the card_number field.

Expression: data_masker.erase(data, fields=["card_number"])

12345
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444"}
12345
{"name":"Carlos","operation":"non sensitive","card_number":"*****"}

You want to erase data in the postcode field.

Expression: data_masker.erase(data, fields=["address.postcode"])

12345678
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":{"postcode":12345}}
12345678
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":{"postcode":"*****"}}

You want to erase data in both postcode and street fields.

Expression: data_masker.erase(data, fields=["address.postcode", "address.street"])

123456789
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":{"postcode":12345,"street":"123 Any Street"}}
123456789
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":{"postcode":"*****","street":"*****"}}

You want to erase data under address field.

Expression: data_masker.erase(data, fields=["address"])

 1 2 3 4 5 6 7 8 910111213141516171819
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":[{"postcode":12345,"street":"123 Any Street","country":"United States","timezone":"America/La_Paz"},{"postcode":67890,"street":"100 Main Street","country":"United States","timezone":"America/Mazatlan"}]}
123456789
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":["*****","*****"]}

You want to erase data under name field.

Expression: data_masker.erase(data, fields=["category..name"])

 1 2 3 4 5 6 7 8 91011
{"category":{"subcategory":{"brand":{"product":{"name":"Car"}}}}}
 1 2 3 4 5 6 7 8 91011
{"category":{"subcategory":{"brand":{"product":{"name":"*****"}}}}}

You want to erase data under street field located at the any index of the address list.

Expression: data_masker.erase(data, fields=["address[*].street"])

 1 2 3 4 5 6 7 8 9101112131415
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":[{"postcode":12345,"street":"123 Any Drive"},{"postcode":67890,"street":"100 Main Street,"}]}
 1 2 3 4 5 6 7 8 9101112131415
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":[{"postcode":12345,"street":"*****"},{"postcode":67890,"street":"*****"}]}

You want to erase data by slicing a list.

Expression: data_masker.erase(data, fields=["address[-1].street"])

 1 2 3 4 5 6 7 8 910111213141516171819
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":[{"postcode":12345,"street":"123 Any Street"},{"postcode":67890,"street":"100 Main Street"},{"postcode":78495,"street":"111 Any Drive"}]}
 1 2 3 4 5 6 7 8 910111213141516171819
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":[{"postcode":12345,"street":"123 Any Street"},{"postcode":67890,"street":"100 Main Street"},{"postcode":11111,"street":"*****"}]}

You want to erase data by finding for a field with conditional expression.

Expression: data_masker.erase(data, fields=["$.address[?(@.postcode > 12000)]"])

$: Represents the root of the JSON structure.

.address: Selects the "address" property within the JSON structure.

(@.postcode > 12000): Specifies the condition that elements should meet. It selects elements where the value of the postcode property is greater than 12000.

 1 2 3 4 5 6 7 8 910111213141516171819
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":[{"postcode":12345,"street":"123 Any Drive"},{"postcode":67890,"street":"111 Main Street"},{"postcode":11111,"street":"100 Any Street"}]}
 1 2 3 4 5 6 7 8 910111213141516171819
{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":[{"postcode":12345,"street":"*****"},{"postcode":67890,"street":"*****"},{"postcode":11111,"street":"100 Any Street"}]}

For comprehensive guidance on using JSONPath syntax, please refer to the official documentation available at jsonpath-ng

JSON

We also support data in JSON string format as input. We automatically deserialize it, then handle each field operation as expected.

Note that the return will be a deserialized JSON and your desired fields updated.

Expression: data_masker.erase(data, fields=["card_number", "address.postcode"])

1
'{"name":"Carlos","operation":"non sensitive","card_number":"1111 2222 3333 4444","address":{"postcode":12345}}'
12345678
{"name":"Carlos","operation":"non sensitive","card_number":"*****","address":{"postcode":"*****"}}

Advanced

Data serialization

Extended input support

We support Pydantic models, Dataclasses, and custom classes with dict() or __dict__ for input.

These types are automatically converted into dictionaries before masking and encrypting operations. Please not that we don't convert back to the original type, and the returned object will be a dictionary.

Before we traverse the data structure, we perform two important operations on input data:

  1. If JSON string, deserialize using default or provided deserializer.
  2. If dictionary or complex types, normalize into JSON to prevent traversing unsupported data types.

For compatibility or performance, you can optionally pass your own JSON serializer and deserializer to replace json.dumps and json.loads respectively:

 1 2 3 4 5 6 7 8 91011121314151617
fromaws_lambda_powertools.utilities.data_maskingimportDataMaskingdata_masker=DataMasking()classUser:def__init__(self,name,age):self.name=nameself.age=agedefdict(self):return{"name":self.name,"age":self.age}deflambda_handler(event,context):user=User("powertools",42)returndata_masker.erase(user,fields=["age"])
 1 2 3 4 5 6 7 8 9101112131415
frompydanticimportBaseModelfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingdata_masker=DataMasking()classUser(BaseModel):name:strage:intdeflambda_handler(event,context):user=User(name="powertools",age=42)returndata_masker.erase(user,fields=["age"])
 1 2 3 4 5 6 7 8 910111213141516
fromdataclassesimportdataclassfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingdata_masker=DataMasking()@dataclassclassUser:name:strage:intdeflambda_handler(event,context):user=User(name="powertools",age=42)returndata_masker.erase(user,fields=["age"])
 1 2 3 4 5 6 7 8 91011121314151617181920212223242526
from__future__importannotationsimportosimportujsonfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdkimport(AWSEncryptionSDKProvider,)fromaws_lambda_powertools.utilities.typingimportLambdaContextKMS_KEY_ARN=os.getenv("KMS_KEY_ARN","")encryption_provider=AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN],json_serializer=ujson.dumps,json_deserializer=ujson.loads,)data_masker=DataMasking(provider=encryption_provider)deflambda_handler(event:dict,context:LambdaContext)->str:data:dict=event.get("body",{})returndata_masker.encrypt(data)

Using multiple keys

You can use multiple KMS keys from more than one AWS account for higher availability, when instantiating AWSEncryptionSDKProvider.

using_multiple_keys.py
 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829
from__future__importannotationsimportosfromaws_lambda_powertoolsimportLoggerfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdkimport(AWSEncryptionSDKProvider,)fromaws_lambda_powertools.utilities.typingimportLambdaContextKMS_KEY_ARN_1=os.getenv("KMS_KEY_ARN_1","")KMS_KEY_ARN_2=os.getenv("KMS_KEY_ARN_2","")encryption_provider=AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN_1,KMS_KEY_ARN_2])data_masker=DataMasking(provider=encryption_provider)logger=Logger()@logger.inject_lambda_contextdeflambda_handler(event:dict,context:LambdaContext)->dict:data:dict=event.get("body",{})logger.info("Encrypting the whole object")encrypted=data_masker.encrypt(data)return{"body":encrypted}

Providers

AWS Encryption SDK

You can modify the following values when initializing the AWSEncryptionSDKProvider to best accommodate your security and performance thresholds.

ParameterDefaultDescription
local_cache_capacity100The maximum number of entries that can be retained in the local cryptographic materials cache
max_cache_age_seconds300The maximum time (in seconds) that a cache entry may be kept in the cache
max_messages_encrypted4294967296The maximum number of messages that may be encrypted under a cache entry
max_bytes_encrypted9223372036854775807The maximum number of bytes that may be encrypted under a cache entry

If required, you can customize the default values when initializing the AWSEncryptionSDKProvider class.

aws_encryption_provider_example.py
 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435
from__future__importannotationsimportosfromaws_lambda_powertoolsimportLoggerfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdkimport(AWSEncryptionSDKProvider,)fromaws_lambda_powertools.utilities.typingimportLambdaContextKMS_KEY_ARN=os.getenv("KMS_KEY_ARN","")encryption_provider=AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN],local_cache_capacity=200,max_cache_age_seconds=400,max_messages_encrypted=200,max_bytes_encrypted=2000,)data_masker=DataMasking(provider=encryption_provider)logger=Logger()@logger.inject_lambda_contextdeflambda_handler(event:dict,context:LambdaContext)->dict:data:dict=event.get("body",{})logger.info("Encrypting the whole object")encrypted=data_masker.encrypt(data)return{"body":encrypted}
Passing additional SDK arguments

As an escape hatch mechanism, you can pass additional arguments to the AWSEncryptionSDKProvider via the provider_options parameter.

For example, the AWS Encryption SDK defaults to using the AES_256_GCM_HKDF_SHA512_COMMIT_KEY_ECDSA_P384 algorithm for encrypting your Data Key. If you want, you have the flexibility to customize and choose a different encryption algorithm.

changing_default_algorithm.py
 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233
from__future__importannotationsimportosfromaws_encryption_sdk.identifiersimportAlgorithmfromaws_lambda_powertoolsimportLoggerfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdkimportAWSEncryptionSDKProviderfromaws_lambda_powertools.utilities.typingimportLambdaContextKMS_KEY_ARN=os.getenv("KMS_KEY_ARN","")encryption_provider=AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])data_masker=DataMasking(provider=encryption_provider)logger=Logger()@logger.inject_lambda_contextdeflambda_handler(event:dict,context:LambdaContext)->str:data:dict=event.get("body",{})logger.info("Encrypting whole object with a different algorithm")provider_options={"algorithm":Algorithm.AES_256_GCM_HKDF_SHA512_COMMIT_KEY}encrypted=data_masker.encrypt(data,provider_options=provider_options,)returnencrypted

Data masking request flow

The following sequence diagrams explain how DataMasking behaves under different scenarios.

Erase operation

Erasing operations occur in-memory and we cannot recover the original value.

sequenceDiagram autonumber participant Client participant Lambda participant DataMasking as Data Masking (in memory) Client->>Lambda: Invoke (event) Lambda->>DataMasking: erase(data) DataMasking->>DataMasking: replaces data with ***** Note over Lambda,DataMasking: No encryption providers involved. DataMasking->>Lambda: data masked Lambda-->>Client: Return response
Simple masking operation

Encrypt operation with Encryption SDK (KMS)

We call KMS to generate an unique data key that can be used for multiple encrypt operation in-memory. It improves performance, cost and prevent throttling.

To make this operation simpler to visualize, we keep caching details in a separate sequence diagram. Caching is enabled by default.

sequenceDiagram autonumber participant Client participant Lambda participant DataMasking as Data Masking participant EncryptionProvider as Encryption Provider Client->>Lambda: Invoke (event) Lambda->>DataMasking: Init Encryption Provider with master key Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY]) Lambda->>DataMasking: encrypt(data) DataMasking->>EncryptionProvider: Create unique data key Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API DataMasking->>DataMasking: Cache new unique data key DataMasking->>DataMasking: DATA_KEY.encrypt(data) DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY) DataMasking->>DataMasking: Create encrypted message Note over DataMasking: Encrypted message includes encrypted data, data key encrypted, algorithm, and more. DataMasking->>Lambda: Ciphertext from encrypted message Lambda-->>Client: Return response
Encrypting operation using envelope encryption.

Encrypt operation with multiple KMS Keys

When encrypting data with multiple KMS keys, the aws_encryption_sdk makes additional API calls to encrypt the data with each of the specified keys.

sequenceDiagram autonumber participant Client participant Lambda participant DataMasking as Data Masking participant EncryptionProvider as Encryption Provider Client->>Lambda: Invoke (event) Lambda->>DataMasking: Init Encryption Provider with master key Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KEY_1, KEY_2]) Lambda->>DataMasking: encrypt(data) DataMasking->>EncryptionProvider: Create unique data key Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API - KEY_1 DataMasking->>DataMasking: Cache new unique data key DataMasking->>DataMasking: DATA_KEY.encrypt(data) DataMasking->>DataMasking: KEY_1.encrypt(DATA_KEY) loop For every additional KMS Key DataMasking->>EncryptionProvider: Encrypt DATA_KEY Note over DataMasking,EncryptionProvider: KMS Encrypt API - KEY_2 end DataMasking->>DataMasking: Create encrypted message Note over DataMasking: Encrypted message includes encrypted data, all data keys encrypted, algorithm, and more. DataMasking->>Lambda: Ciphertext from encrypted message Lambda-->>Client: Return response
Encrypting operation using envelope encryption.

Decrypt operation with Encryption SDK (KMS)

We call KMS to decrypt the encrypted data key available in the encrypted message. If successful, we run authentication (context) and integrity checks (algorithm, data key length, etc) to confirm its proceedings.

Lastly, we decrypt the original encrypted data, throw away the decrypted data key for security reasons, and return the original plaintext data.

sequenceDiagram autonumber participant Client participant Lambda participant DataMasking as Data Masking participant EncryptionProvider as Encryption Provider Client->>Lambda: Invoke (event) Lambda->>DataMasking: Init Encryption Provider with master key Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY]) Lambda->>DataMasking: decrypt(data) DataMasking->>EncryptionProvider: Decrypt encrypted data key Note over DataMasking,EncryptionProvider: KMS Decrypt API DataMasking->>DataMasking: Authentication and integrity checks DataMasking->>DataMasking: DATA_KEY.decrypt(data) DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY) DataMasking->>DataMasking: Discards decrypted data key DataMasking->>Lambda: Plaintext Lambda-->>Client: Return response
Decrypting operation using envelope encryption.

Caching encrypt operations with Encryption SDK

Without caching, every encrypt() operation would generate a new data key. It significantly increases latency and cost for ephemeral and short running environments like Lambda.

With caching, we balance ephemeral Lambda environment performance characteristics with adjustable thresholds to meet your security needs.

Data key recycling

We request a new data key when a cached data key exceeds any of the following security thresholds:

  1. Max age in seconds
  2. Max number of encrypted messages
  3. Max bytes encrypted across all operations

sequenceDiagram autonumber participant Client participant Lambda participant DataMasking as Data Masking participant EncryptionProvider as Encryption Provider Client->>Lambda: Invoke (event) Lambda->>DataMasking: Init Encryption Provider with master key Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY]) Lambda->>DataMasking: encrypt(data) DataMasking->>EncryptionProvider: Create unique data key Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API DataMasking->>DataMasking: Cache new unique data key DataMasking->>DataMasking: DATA_KEY.encrypt(data) DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY) DataMasking->>DataMasking: Create encrypted message Note over DataMasking: Encrypted message includes encrypted data, data key encrypted, algorithm, and more. DataMasking->>Lambda: Ciphertext from encrypted message Lambda->>DataMasking: encrypt(another_data) DataMasking->>DataMasking: Searches for data key in cache alt Is Data key in cache? DataMasking->>DataMasking: Reuses data key else Is Data key evicted from cache? DataMasking->>EncryptionProvider: Create unique data key DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY) end DataMasking->>DataMasking: DATA_KEY.encrypt(data) DataMasking->>DataMasking: Create encrypted message DataMasking->>Lambda: Ciphertext from encrypted message Lambda-->>Client: Return response
Caching data keys during encrypt operation.

Testing your code

Testing erase operation

Testing your code with a simple erase operation

 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031
fromdataclassesimportdataclassimportpytestimporttest_lambda_mask@dataclassclassLambdaContext:function_name:str="test"memory_limit_in_mb:int=128invoked_function_arn:str="arn:aws:lambda:eu-west-1:111111111:function:test"aws_request_id:str="52fdfc07-2182-154f-163f-5f0f9a621d72"defget_remaining_time_in_millis(self)->int:return5@pytest.fixturedeflambda_context()->LambdaContext:returnLambdaContext()deftest_encrypt_lambda(lambda_context):# GIVEN: A sample event for testingevent={"testkey":"testvalue"}# WHEN: Invoking the lambda_handler function with the sample event and Lambda contextresult=test_lambda_mask.lambda_handler(event,lambda_context)# THEN: Assert that the result matches the expected outputassertresult=={"testkey":"*****"}
 1 2 3 4 5 6 7 8 91011121314
from__future__importannotationsfromaws_lambda_powertools.utilities.data_maskingimportDataMaskingfromaws_lambda_powertools.utilities.typingimportLambdaContextdata_masker=DataMasking()deflambda_handler(event:dict,context:LambdaContext)->dict:data=eventerased=data_masker.erase(data,fields=["testkey"])returnerased
close