Design Google Drive/Dropbox
Design Google Drive/Dropbox
FR:
- Upload files/media/
- Upload limit per file
- Shareable
- User Auth
- CRUD operations on uploaded file
- Syncronization
NFR:
- Data availbility
- Data integrity
- Fast downloads
Data storage:
- 1B users, average 15GB/user
- 15PB*3 replication = 45PB
- Need CRUD - Need ACID for sync
- SQL for file storage and S3 for blob storage
Components:
- Uploader service
- Updates to storage in chunks
- Updates Metadata servicewith with what was uploaded - this would probably
be a gRPC call
- Receives individual chunk from client and uploads to Blob
- Metadataservice
- Tallks to metadata DB
- Gets input from clients about the chunk and metadata they have
- Gets input from Uploader service about the chunks and metadata uploaded
from client
- Offloads syncing to all devices with updated metadata via sync service
- Sync service
- Powered by a messages queue
- Communicate only the diff
- Push pull model for all types of documents
- If file, push the metadata change and client can pull what it doesn’t
have
- If large file pull the entire file.
- Metadata DB
- Replication service
- Offline replication of all shards in all zones
- Clients
- Store some metadata of state of file on that client
- List of all files
- Chunk info of each file
- Locations
- Versions
- Last updated time
- Has a chunker that does the actual chunking work
- Has and indexer to store which chunk goes where and re-create index when
there a change in local client - talks to sync component
- Sends data to indexer whenever there is a change in local client
- Inline deduplication to avoid storing same files in server
- Metadata partitioning for scale
- Caching of hot files