

Cloud Engineer, from Ostermundigen
Data Masking of AWS Lambda Function Logs
What are the problems?
Existing Approaches:
CloudWatch Logs Native Data Masking
CloudWatch Logs Native Data Masking Natively, AWS CloudWatch Logs allow data masking by using managed or custom data identifiers and data protection policies. The data identifiers are pattern matching or machine learning models which detect sensitive data. Data protection policies are JSON documents describing the operations to perform on the identified sensitive data. The operation can be set to just “audit” or to “de-identify” the data in which case, only principals authorized to perform the log:Unmask action would be able to see the data.
Note that only new data written to CloudWatch Logs will be masked according to the defined policies. Someone with access to the logs would still be able to see the sensitive data written before enabling data masking.
Although this approach prevents access to sensitive data for unauthorized personnel, it does not help with complying with the right to be forgotten.
AWS Lambda Powertools Data Masking
AWS Lambda Powertools is “a developer toolkit to implement Serverless best practices and increase developer velocity” originally developed in Python but made available for Java, Typescript und .NET As of now only the Python version offers a utility for data masking.
Two approaches are proposed. One approach which uses a KMS key to encrypt/decrypt the sensitive information inside the log. A second approach which simply erases the sensitive information before writing the logs. To implement the first approach but also comply with regulations like the right to be forgotten, you would need to have one encryption key per customer, find some way to encrypt each customer’s information with their own key. Should they exercise their right to be forgotten, simply delete the encryption key, making their data forever unrecoverable.
Although those approaches can address both problems, you must know exactly what to encrypt/erase. For example, to erase the phone number in a list of customers you would need to do something like this:
data_masker.erase(data, fields=["customers[*].phone_number"]
But what if you are unsure at the start of a project about the data structure and the content? What if the data schema changes? What if you forgot a field in a nested JSON structure?
Erasing All PII by Default
Do you really need sensitive information like PII in application logs?
Probably not.
In that case, the AWS Lambda Powertools data erasing approach seems like the simplest one. But again, it works as long as you know the data structure and it doesn’t change. As a security/compliance officer how can I make sure the developers don’t forget to erase sensitive information?
So, I wanted to improve on the AWS Lambda Powertools approach, to erase sensitive information, wherever they are in the logs…
This is what I came up with based on the AWS Lambda Powertools data masking utility.
1 | Create a Function to Erase Sensitive Data
import json from warnings import catch_warnings from functools import wraps, partial from decimal import Decimal from typing import Any from aws_lambda_powertools import Logger from aws_lambda_powertools.utilities.data_masking import DataMasking from aws_lambda_powertools.utilities.data_masking.provider import BaseProvider def is_valid_json_string(json_string: str) -> bool: if isinstance(json_string, str): try: result = json.loads(json_string) return isinstance(result, dict) except json.JSONDecodeError: return False def log_masking_decorator(masked_fields: list[str]): def decorator(func): @wraps(func) def wrapper(self, msg, *args, **kwargs): if is_valid_json_string(msg) or isinstance(msg, dict): with catch_warnings(action="ignore"): msg = self.data_masker.erase(msg, fields=masked_fields) return func(self, msg, *args, **kwargs) return wrapper return decorator
Code explanations:
- The data_masker.erase() function only works on dictionaries and string containing a JSON object. So we need to verify the type of the message before erasing the data.
- The AWS Lambda Powertools Data Masker raises a warning if you instruct it to mask a field which it can’t find. With this approach where I want to globally define a list a fields to mask everywhere, this will result in a lot of warnings in CloudWatch Logs, which I don’t want. So I ignore the warnings before calling the erase() method.
2 | Apply the Function on all Logging Methods
def decorate_log_methods(decorator): def decorate(cls): for attr in dir(cls): if callable(getattr(cls, attr)) and attr in [ "info", "error", "warning", "exception", "debug", "critical", ]: setattr(cls, attr, decorator(getattr(cls, attr))) return cls return decorate
3 | Create a Custom Logger Class
def decimal_serializer(obj: Any) -> Any: if isinstance(obj, Decimal): obj = str(obj) return obj @decorate_log_methods( log_masking_decorator( masked_fields=[ "$.[*].phoneNumber", "$..[*].phoneNumber", "$.[*].name", "$..[*].name", ] ) ) class CustomLogger(Logger): def __init__(self): super().__init__() self.datamasking_provider = BaseProvider( json_serializer=partial(json.dumps, default=decimal_serializer), json_deserializer=json.loads, ) self.data_masker = DataMasking( provider=self.datamasking_provider, raise_on_missing_field=False )
Code explanations:
- I use here a custom JSON serializer to convert Python Decimal values into strings to avoid errors.
4 | Usage
from log_helpers import CustomLogger logger = CustomLogger() @logger.inject_lambda_context(log_event=True) def lambda_handler(event: dict, context: LambdaContext): response = boto3_client.whatever_service_api() logger.info(response)
Code explanations:
- The inject_lambda_context decorator calls the logger.info(). Since the logger is here our custom logger, all PII listed in our CustomLogger class decorator will be erase from the Lambda event logs.
This achieves the goal of enforcing the erasure of all the listed PII without the developer having to specifically list each field to erase on every logging action.
The full code of the custom logger is available here. The repository contains a full demo showing how to secure an AWS API Gateway.
Would I Use That in Production?
No.
Parsing the entire JSON structure of every log will increase the latency of the response of your Lambda function, which is not something you want. As the documentation of the AWS Lambda Power Tools says, logging the event of the Lambda handler function should only be done in non-production environments. And you should also know the data your Lambda Function is handling and thus erase the specific sensitive data where necessary for efficiency.
I still find it an interesting approach which could be useful in some cases. Test environments should not have production data, but hey, we have all seen those cases out there…
It was nevertheless an interesting exercise to try.