The next version would be using aws lambda mounting efs(dir_structure & meta) + dynamodb (prefix locking) + s3(actual storage), which guarantees stronger ACID
The lock mechanism:
- read
- Insert one record(path=path, rw=) to dynamodb as a lock.
- Perform a BEGINS_WITH query to see if there are any previous WRITE_LOCKs (not yet timeout), if TRUE, polling until we can perform our action.
- Remove the record after done.
- write
- Insert one record(path=path, rw=w) to dynamodb as a lock.
- Perform a BEGINS_WITH query to see if there are any previous READ_LOCKs or WRITE_LOCKs (not yet timeout), if TRUE, polling until we can perform our action.
- Remove the record after done.
As for moving, we lock on the common prefix of the src & dst dir.
The ideally structure would be (project_root, lock usage) / users(or group) / ... For resolving the sharing model, we'll always iterate through the whole tree under project_root.
A NAS managing dir structure with filesystem and store objects with S3
- High throughput : Thanks to S3.
- Versioning : Important feature.
- Access Control : CRUD x private/link/account white list/... which can be easily integrated with other modules
- Transaction : Not supported due to the strategy of consistency / concurrency, users should maintain on their own ( like disallow multiple sessions of the same account which may modify the same object )
- Duplicated filename : Duplicated filename is force allowed, we use the Object Name as an unique identifier.
- Directory : Directory is implemented.
- Download : Currently we only support download single file. If you'd like to download the whole directory, you should handle it with your client, such as maintaining the file tree and handle duplicated dirname / filename.
- Create
- Modify / Upload New Version
- Delete
Once the Object has been marked deleted, even their was an ongoing Modify / Upload New Version operation successfully performed to S3 after the Delete operation, such object is still marked deleted.
- (Client) Ask NAS for all the subdir's Object Name (because we may got duplicated dirname)
- (NAS) Authorize & response.
- (Client) Ask NAS for S3 upload dst.
- (NAS) Authorize the request.
- (NAS) Generate a presignedPOST.
- (NAS) Modify the filetree.
- (NAS) (TODO) Add the S3 upload dst to the DynamoDB table
ongoingwith the same expired time as presignedPOST. - (Client) Perform upload to S3.
- (Client) Tell NAS you've done uploading.
- (NAS) Modify the filetree
- (NAS) (TODO) delete the record in table
ongoing.
- Concurrency
if User A and user B List the tree as below
root - dirA - a.txt
\- dirB - b.txt
User A moves b.txt to dirA, and User B deletes dirA
It's possible that both success
root - (deleted) dirA - a.txt
\- dirB \- b.txt
Which means, User B doesn't know he just deleted the b.txt inside dirA
The Application should handle this on its own.
- Q : Why don't we use Cognito?
A : Flexibility & pricing(if we validate every request, $0.0055 MAU is not a good deal, especially if developers take this service as a module, they may already have their own account management system)