-
Notifications
You must be signed in to change notification settings - Fork 429
feat(data_masking): add new sensitive data masking utility #2197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
72 commits
Select commit
Hold shift + click to select a range
2d5bfcc
Added logic for sensitive data masking and unit tests
seshubaws b2e4d10
Merge branch 'develop' into develop
leandrodamascena 7d65c7d
Merge branch 'develop' into develop
leandrodamascena 4b0d0c0
Restructured into smaller files, fixed linting errors
seshubaws b34a1ca
Fix linting errors
seshubaws 092c165
Merge branch 'awslabs:develop' into develop
seshubaws 4ec6603
Merge branch 'develop' of https://github.com/seshubaws/aws-lambda-pow…
seshubaws 7b13c6f
Merge branch 'awslabs:develop' into develop
seshubaws c6ec149
Lint tests
seshubaws 8bc8c02
Merge branch 'develop' of https://github.com/seshubaws/aws-lambda-pow…
seshubaws 21759b5
Fix mypy errors
seshubaws 6a2e98a
Fixing tests
seshubaws d1b6690
Merge branch 'develop' into develop
leandrodamascena d39d956
mypy fixes
seshubaws 2157815
Merge branch 'develop' of https://github.com/seshubaws/aws-lambda-pow…
seshubaws 97c5b85
Fixed passing in context for aws encryption sdk provider
seshubaws f722e70
Use d pytest library for unit testing
seshubaws d5f014b
Raise error for unimplemented dm provider
seshubaws bef87e0
Fix context for encryption sdk provider
seshubaws 65eb7e3
Add type annotation to context
seshubaws fb3fbc6
Fix context
seshubaws ec9f49f
Fixing tests
seshubaws 98ba4d9
Added markdown-lint to pre-commit yaml
seshubaws f48c2f5
Merging from develop + creating extra dependencies
leandrodamascena 3ad5046
Merging from develop + creating extra dependencies
leandrodamascena 5b7e256
Merging from develop + creating extra dependencies
leandrodamascena b9053d9
Revisions per comments
seshubaws 0193ee6
Added performance benchmarking tests
seshubaws 22f0b46
Update aws_lambda_powertools/utilities/data_masking/providers/aws_enc…
seshubaws ece4643
Update aws_lambda_powertools/utilities/data_masking/providers/aws_enc…
seshubaws 8299039
Removed args and ItsDangerous and commented on tests
seshubaws c36deb5
Merge branch 'develop' of https://github.com/seshubaws/aws-lambda-pow…
seshubaws 5423f7f
Merge branch 'develop' of https://github.com/aws-powertools/powertool…
seshubaws 27eca17
Added functional tests and put input data in separate file
seshubaws 876f4f7
Merge branch 'develop' into develop
heitorlessa fe37c50
Applied patch to update lock to latest range deps
seshubaws 2eab50b
Made unit tests more legible, removed parameterization
seshubaws 57a5a3a
Adding E2E tests (wip)
seshubaws 8aabc7f
Added data_masking constants, made into BaseProvider and added types
seshubaws bbeaa4e
Add check for encryption_context in Encryption SDK
seshubaws 5b794f7
Fixing enc_context e2e tests
seshubaws 2955c9c
Added test to encrypt&decrypt from logs in e2e tests
seshubaws b15b866
Added custom exception for enc_context mismatch, used pytest fixtures…
seshubaws ee3dddc
Added some docstrings and typing
seshubaws a79f3df
Added test for using DataMasking in a lambda handler, wip due to inco…
seshubaws 7483d46
Merge remote-tracking branch 'upstream/develop' into develop
seshubaws 7883a48
Revised singleton class to allow for one instance per different confi…
seshubaws 7127c9c
Removed itsdangerous dependencies
seshubaws 01885a5
Added serializer for aws enc sdk
seshubaws 5b83b66
chore: fix merge conflict, remove itsdangerous leftovers (#2)
heitorlessa 371ea05
Building data within func tests instead of using setup.py
seshubaws b3d123d
Updated json serializer for aws encrypt sdk to return original data type
seshubaws c0c3f2f
Added ability for user input custom json de/serializer in base class
seshubaws c5233af
Apply patch for use latest manylinux
seshubaws bcc735a
Added KMS permissions to lambda handler for e2e tests
seshubaws eee4c86
Clarified variable names and documented logic (wip still need to disc…
seshubaws ab15acd
Polished var names, error strings, documentation, etc
seshubaws 73ae382
Added a stack for load testing data masking and added artillery confi…
seshubaws 39a835e
Added 1024MB funcs and load tested with them
seshubaws da24bcf
Removed orchestrator function and test since same test in E2E
seshubaws 970df5c
Removed singleton class from code and load and e2e tests
seshubaws 487dc0e
Merge from upstream
seshubaws 069aa94
Fix linting errors
seshubaws ee325f4
Fix mypy errors
seshubaws 49afeed
Modified data masking test names
seshubaws 73df808
Fix dummy KMS key for correct parsing
seshubaws 1ea59f0
Bumping cryptography library
leandrodamascena ba534ed
Setting default region to avoid HTTP connection
leandrodamascena ceb6131
Removing user agent tracking
leandrodamascena bf0e4ed
Reverting
leandrodamascena 6a064b1
Creating a specific provider instead a client to avoid any http call …
leandrodamascena c01ea35
Merge branch 'develop' into develop
leandrodamascena File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
from aws_lambda_powertools.utilities.data_masking.base import DataMasking | ||
|
||
__all__ = [ | ||
"DataMasking", | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
import json | ||
from typing import Optional, Union | ||
|
||
from aws_lambda_powertools.utilities.data_masking.provider import BaseProvider | ||
|
||
|
||
class DataMasking: | ||
""" | ||
A utility class for masking sensitive data within various data types. | ||
|
||
This class provides methods for masking sensitive information, such as personal | ||
identifiers or confidential data, within different data types such as strings, | ||
dictionaries, lists, and more. It helps protect sensitive information while | ||
preserving the structure of the original data. | ||
|
||
Usage: | ||
Instantiate an object of this class and use its methods to mask sensitive data | ||
based on the data type. Supported data types include strings, dictionaries, | ||
and more. | ||
|
||
Example: | ||
``` | ||
from aws_lambda_powertools.utilities.data_masking.base import DataMasking | ||
|
||
def lambda_handler(event, context): | ||
masker = DataMasking() | ||
|
||
data = { | ||
"project": "powertools", | ||
"sensitive": "xxxxxxxxxx" | ||
} | ||
|
||
masked = masker.mask(data,fields=["sensitive"]) | ||
|
||
return masked | ||
|
||
``` | ||
""" | ||
|
||
def __init__(self, provider: Optional[BaseProvider] = None): | ||
self.provider = provider or BaseProvider() | ||
|
||
def encrypt(self, data, fields=None, **provider_options): | ||
return self._apply_action(data, fields, self.provider.encrypt, **provider_options) | ||
|
||
def decrypt(self, data, fields=None, **provider_options): | ||
return self._apply_action(data, fields, self.provider.decrypt, **provider_options) | ||
|
||
def mask(self, data, fields=None, **provider_options): | ||
return self._apply_action(data, fields, self.provider.mask, **provider_options) | ||
|
||
def _apply_action(self, data, fields, action, **provider_options): | ||
""" | ||
Helper method to determine whether to apply a given action to the entire input data | ||
or to specific fields if the 'fields' argument is specified. | ||
|
||
Parameters | ||
---------- | ||
data : any | ||
The input data to process. | ||
fields : Optional[List[any]] = None | ||
A list of fields to apply the action to. If 'None', the action is applied to the entire 'data'. | ||
action : Callable | ||
The action to apply to the data. It should be a callable that performs an operation on the data | ||
and returns the modified value. | ||
|
||
Returns | ||
------- | ||
any | ||
The modified data after applying the action. | ||
""" | ||
|
||
if fields is not None: | ||
return self._apply_action_to_fields(data, fields, action, **provider_options) | ||
else: | ||
return action(data, **provider_options) | ||
|
||
def _apply_action_to_fields( | ||
self, | ||
data: Union[dict, str], | ||
fields: list, | ||
action, | ||
**provider_options, | ||
) -> Union[dict, str]: | ||
""" | ||
This method takes the input data, which can be either a dictionary or a JSON string, | ||
and applies a mask, an encryption, or a decryption to the specified fields. | ||
|
||
Parameters | ||
---------- | ||
data : Union[dict, str]) | ||
The input data to process. It can be either a dictionary or a JSON string. | ||
fields : List | ||
A list of fields to apply the action to. Each field can be specified as a string or | ||
a list of strings representing nested keys in the dictionary. | ||
action : Callable | ||
The action to apply to the fields. It should be a callable that takes the current | ||
value of the field as the first argument and any additional arguments that might be required | ||
for the action. It performs an operation on the current value using the provided arguments and | ||
returns the modified value. | ||
**provider_options: | ||
Additional keyword arguments to pass to the 'action' function. | ||
|
||
Returns | ||
------- | ||
dict | ||
The modified dictionary after applying the action to the | ||
specified fields. | ||
|
||
Raises | ||
------- | ||
ValueError | ||
If 'fields' parameter is None. | ||
TypeError | ||
If the 'data' parameter is not a traversable type | ||
|
||
Example | ||
------- | ||
```python | ||
>>> data = {'a': {'b': {'c': 1}}, 'x': {'y': 2}} | ||
>>> fields = ['a.b.c', 'a.x.y'] | ||
# The function will transform the value at 'a.b.c' (1) and 'a.x.y' (2) | ||
# and store the result as: | ||
new_dict = {'a': {'b': {'c': 'transformed_value'}}, 'x': {'y': 'transformed_value'}} | ||
``` | ||
""" | ||
|
||
if fields is None: | ||
raise ValueError("No fields specified.") | ||
|
||
if isinstance(data, str): | ||
# Parse JSON string as dictionary | ||
my_dict_parsed = json.loads(data) | ||
elif isinstance(data, dict): | ||
# In case their data has keys that are not strings (i.e. ints), convert it all into a JSON string | ||
my_dict_parsed = json.dumps(data) | ||
# Turn back into dict so can parse it | ||
my_dict_parsed = json.loads(my_dict_parsed) | ||
else: | ||
raise TypeError( | ||
f"Unsupported data type for 'data' parameter. Expected a traversable type, but got {type(data)}.", | ||
) | ||
|
||
# For example: ['a.b.c'] in ['a.b.c', 'a.x.y'] | ||
for nested_key in fields: | ||
# Prevent overriding loop variable | ||
curr_nested_key = nested_key | ||
|
||
# If the nested_key is not a string, convert it to a string representation | ||
if not isinstance(curr_nested_key, str): | ||
seshubaws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
curr_nested_key = json.dumps(curr_nested_key) | ||
|
||
# Split the nested key string into a list of nested keys | ||
# ['a.b.c'] -> ['a', 'b', 'c'] | ||
keys = curr_nested_key.split(".") | ||
|
||
# Initialize a current dictionary to the root dictionary | ||
curr_dict = my_dict_parsed | ||
|
||
# Traverse the dictionary hierarchy by iterating through the list of nested keys | ||
for key in keys[:-1]: | ||
curr_dict = curr_dict[key] | ||
|
||
# Retrieve the final value of the nested field | ||
valtochange = curr_dict[(keys[-1])] | ||
|
||
# Apply the specified 'action' to the target value | ||
curr_dict[keys[-1]] = action(valtochange, **provider_options) | ||
|
||
return my_dict_parsed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
DATA_MASKING_STRING: str = "*****" | ||
CACHE_CAPACITY: int = 100 | ||
MAX_CACHE_AGE_SECONDS: float = 300.0 | ||
MAX_MESSAGES_ENCRYPTED: int = 200 | ||
# NOTE: You can also set max messages/bytes per data key |
5 changes: 5 additions & 0 deletions
5
aws_lambda_powertools/utilities/data_masking/provider/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
from aws_lambda_powertools.utilities.data_masking.provider.base import BaseProvider | ||
|
||
__all__ = [ | ||
"BaseProvider", | ||
] |
34 changes: 34 additions & 0 deletions
34
aws_lambda_powertools/utilities/data_masking/provider/base.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
import json | ||
from typing import Any | ||
|
||
from aws_lambda_powertools.utilities.data_masking.constants import DATA_MASKING_STRING | ||
|
||
|
||
class BaseProvider: | ||
""" | ||
When you try to create an instance of a subclass that does not implement the encrypt method, | ||
you will get a NotImplementedError with a message that says the method is not implemented: | ||
""" | ||
|
||
def __init__(self, json_serializer=None, json_deserializer=None) -> None: | ||
self.json_serializer = json_serializer or self.default_json_serializer | ||
self.json_deserializer = json_deserializer or self.default_json_deserializer | ||
|
||
def default_json_serializer(self, data): | ||
return json.dumps(data).encode("utf-8") | ||
|
||
def default_json_deserializer(self, data): | ||
return json.loads(data.decode("utf-8")) | ||
|
||
def encrypt(self, data) -> str: | ||
raise NotImplementedError("Subclasses must implement encrypt()") | ||
|
||
def decrypt(self, data) -> Any: | ||
raise NotImplementedError("Subclasses must implement decrypt()") | ||
|
||
def mask(self, data) -> Any: | ||
if isinstance(data, (str, dict, bytes)): | ||
return DATA_MASKING_STRING | ||
elif isinstance(data, (list, tuple, set)): | ||
return type(data)([DATA_MASKING_STRING] * len(data)) | ||
return DATA_MASKING_STRING |
5 changes: 5 additions & 0 deletions
5
aws_lambda_powertools/utilities/data_masking/provider/kms/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AwsEncryptionSdkProvider | ||
|
||
__all__ = [ | ||
"AwsEncryptionSdkProvider", | ||
] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.