0% found this document useful (0 votes)
41 views31 pages

Seafile Cloud Storage Platform TEACHING

The document summarizes Seafile, an open source scalable cloud storage system. It describes Seafile's features like fast file syncing between devices, scalability to large storage capacities, and collaborative features. The system design uses a lightweight database and object storage for metadata and file contents. This allows Seafile to scale horizontally and provide high performance. The document also outlines Seafile's roadmap including file locking and improved authorization.

Uploaded by

user.22x6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views31 pages

Seafile Cloud Storage Platform TEACHING

The document summarizes Seafile, an open source scalable cloud storage system. It describes Seafile's features like fast file syncing between devices, scalability to large storage capacities, and collaborative features. The system design uses a lightweight database and object storage for metadata and file contents. This allows Seafile to scale horizontally and provide high performance. The document also outlines Seafile's roadmap including file locking and improved authorization.

Uploaded by

user.22x6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Seafile - Scalable Cloud Storage System

Johnathan Xu
Seafile Ltd.
Agenda
Seafile Introduction
Feature Overview
System Design & Performance
Roadmap
What is Seafile?

VS

Seafile is a FAST, SCALABLE, and PRIVATE


file sync & share solution
What can Seafile do?
• Fast and reliable file sync between cloud and
devices
• Scales to millions of files, PB class storage
• High performance, light weight
• Productive file collaboration
– Groups
– File prview, discussion
– Message and notification
Who are using Seafile?
• https://github.com/haiwen/seafile
2400+ stars
• Estimated at least 100K users worldwide, most
in Europe

Universities in belgian royal institute of


Rhineland-Palatine (Germany) natural sciences
Agenda
Seafile Introduction
Feature Overview
System Design & Performance
Roadmap
File Sync and Share
• Files are organized into Libraries
• Selective sync library to devices
• Sync with existing folder
• Client-side end-to-end data encryption
• Full platform support: Win, OSX, Linux, mobile
• Share to a person or a group
• Read-write and read-only share
• LDAP/AD integration
View all your libraries in the home page
All libraries shared to a group
Desktop Client

Selective sync library


Cloud file browser
Starred files
Notifications
Desktop Client
Collaboration
• File activities
• Group discussion
• File discussion
• Message notifications
File Activities
Message Notifications
Agenda
Seafile Introduction
Feature Overview
System Design & Performance
Roadmap
Server Architecture

Seafile is a “file system” built on top of object storage

Non-POSIX, User space, Light weight


File System Design
Head Commit ID

Relational DB

SHA-1 ID

Object Storage

Data model similar to Git


Design Advantage
• Object storage is more scalable than file system
– Heavy DB + Filesystem v.s. Light DB + Object Storage
• No database bottleneck
– Metadata is in object storage
– Filesystem level versioning v.s. File-level versioning
• File system designed for syncing
– Storage/Network deduplication
– No upload/download limit, fast upload
• Backend daemons implemented in C
Deduplication
Dedup with Content Defined Chunking (CDC) algorithm
Only store/send delta between file system snapshots

Back link
Commit 1 Commit 2

Dir Dir

File 1 v1 File 2 File 1 v2

Block 1 Block 2 Block 3 Block 4 Block 5


Cluster Architecture

MySQL cluster
Load Balancer
Ceph/Swift/S3
Seafile Servers

• Seafile server is stateless, scales horizontally


• Head commit ID and user-library mapping in MySQL cluster
• All data and metadata in object storage
Fast and Reliable File Syncing
• Detect file changes with OS mechanisms
• Low CPU usage on client and server side
• Sync 100K files easily and quickly
• No data transfer after rename/move
• Don’t send duplicate files. Delta dection.
• Handles conflicts
– Concurrent updates
– Case conflict: sync ABC.txt and abc.txt to Windows
• Never remove a file unless user does
Devil is in the details
How Syncing Works
2:Write objects
Relational DB Object Storage

3: Update head commit ID


after objects are saved

1: Client uploads commit,


dir, file, and block objects 4: Client download objects
and check out to folder

Almost looks like Git


Syncing Performance
• Keep version info for the whole fs tree
– Combine many file updates into 1 commit
– A few database writes for a few K files
• Results
– 1 core, 1GB memory VM server
– 40K small files, ~20 files/s upload and download;
single TCP connection; server CPU 2% - 5%
– Big file, ~8MB/s upload and download in 100bps
network; server CPU 50%
Agenda
Seafile Introduction
Feature Overview
System Design & Performance
Roadmap
Roadmap
• Sync & Share
– File locking for better collaboration
– Hierarchical access control within a library
• Auth integration
– OAuth
– Shibboleth
• Improve GUI responsibility with backbone.js
Conclusion
• Do one thing and do one thing well
– Reliablity
– Scalability Choose any three ;-)
– Performance
• Lightweight DB + Object Storage
• Git like data model, no client-side history
• Syncing model similar to Git, redesigned for
auto syncing
Thanks!
File Syncing Algorithm
• Client data 3 stages: worktree, index, repo
– Worktree: user visible folder, one worktree per library
– Index file: last modification time of each file in worktree
– Repo: Internal representation of the latest fs tree for the
library. Only have delta blocks.

commit commit
worktree index repo
checkout checkout
File Syncing Algorithm

Sync State Machine


File Syncing Algorithm
• Upload
– Client creates new commit from batch of local changes
– Diff between local repo and the cached server fs tree
– After objects are uploaded, update server head commit ID
in database
– Server do merge on concurrent updates, resolve conflicts

Commit from client A


HEAD commit on server

Commit from client B


File Syncing Algorithm
• Version Check(init)
– Client caches server’s head commit ID
– Compare with server every 30s, if not the same trigger
download
• download
– Server calculate update list with diff
– Client download and apply the update to worktree
– Update cached server head commit ID

You might also like