SlideShare a Scribd company logo
'FUSE'ing Python for rapid development of
        storage efficient file-system




               PyCon APAC ‘12
          Singapore, Jun 07-09, 2012

        Chetan Giridhar, Vishal Kanaujia
File Systems
• Provides way to organize, store, retrieve and manage
  information
• Abstraction layer
• File system:                                           User
                                         User space
   – Maps name to an object
                                         Kernel space
   – Objects to file contents
                                                      File System
• File system types (format):
                                                      Hardware
   – Media based file systems (FAT, ext2)
   – Network file systems (NFS)
   – Special-purpose file systems (procfs)
• Services
   – open(), read(), write(), close()…
Virtual File-system
• To support multiple FS in *NIX
• VFS is an abstract layer in kernel
• Decouples file system implementation from the
  interface (POSIX API)
   – Common API serving different file system types
• Handles user calls related to file systems.
   – Implements generic FS actions
   – Directs request to specific code to handle the request
• Associate (and disassociate) devices with instances of
  the appropriate file system.
File System: VFS

                              User                   Myapp.py: open(), read()


                            glibc.so                  glibc.so: open(), read()
System call
 interface                                           System call: sys_open(),
                                                           sys_read()
                               VFS
Kernel space                                             VFS: vfs_open(),
                                                           vfs_read()

               ext2           NFS           procfs
                      Block layer/ Device
                       drivers/Hardware
Developing FS in *NIX
• In-kernel file-systems (traditionally)
• It is a complex task
     •   Understanding kernel libraries and modules
     •   Development experience in kernel space
     •   Managing disk i/o
     •   Time consuming and tedious development
          – Frequent kernel panic
          – Higher testing efforts
     • Kernel bloating and side effects like security
Solution: User space
• In user space:
  – Shorter development cycle
  – Easy to update fixes, test and distribute
  – More flexibility
     • Programming tools, debuggers, and libraries as you
       have if you were developing standard *NIX applications
• User-space file-systems
  – File systems become regular applications (as
    opposed to kernel extensions)
FUSE (Filesystem in USErspace)
• Implement a file system in user-space
   – no kernel code required!
• Secure, non-privileged mounts
• Useful to develop “virtual” file-systems
   – Allows you to imagine “anything” as a file ☺
   – local disk, across the network, from memory, or any other
     combination
• User operates on a mounted instance of FS:
   - Unix utilities
   - POSIX libraries
FUSE: Diving deeper
  Myfile.py:
   open()                        CustomFS: open()




    glibc.so                        libfuse.so



VFS: vfs_open()        fuse.ko      /dev/fuse
FUSE | develop
• Choices of development in C, C++, Java, … and of course
  Python!
• Python interface for FUSE
   – (FusePython: most popularly used)
• Open source project
   – http://fuse.sourceforge.net/
• For ubuntu systems:
   $sudo apt-get instatall python-fuse
   $mkdir ./mount_point
   $python myfuse.py ./mount_point
   $fusermount -u ./mount_point
FUSE API Overview
• File management
   – open(path)
   – create(path, mode)
   – read(path, length, offset)
   – write(path, data, offset)
 • Directory and file system management
   – unlink(path)
   – readdir(path)
• Metadata operations
   – getattr(path)
   – chmod(path, mode)
   – chown(path, uid, gid)
seFS – storage efficient FS
• A prototype, experimental file system with:
   – Online data de-duplication (SHA1)
   – Compression (text based)
• SQLite
• Ubuntu 11.04, Python-Fuse Bindings
• Provides following services:
   open()           write()        chmod()
   create()         readdir()      chown()
   read()           unlink()
seFS Architecture
                                     <<FUSE code>>
               Your                    myfuse.py
            application
                                         File
                                       System
                                      Operations
             <<SQLite DB
             Interface>>
               seFS.py

                                                         storage
                                                       Efficiency =
                                                     De-duplication
                                                             +
  <<pylibrary>>                                       Compression
SQLiteHandler.py
 Compression.py            seFS DB
    Crypto.py
seFS: Database
CREATE TABLE metadata(                    CREATE TABLE data(
       "id" INTEGER,                           "id" INTEGER
      "abspath" TEXT,                         PRIMARY KEY
     "length" INTEGER,                      AUTOINCREMENT,
       "mtime" TEXT,        seFS                "sha" TEXT,
       "ctime" TEXT,                           "data" BLOB,
       "atime" TEXT,
                           Schema            "length" INTEGER,
     "inode" INTEGER);                     "compressed" BLOB);




                         data table




                         metadata table
seFS API flow
$touch abc     $rm abc         $cat >> abc    User
                                              Operations




getattr()                         getattr()
               getattr()
                                   open()
 create()      access()                       seFS APIs
                                   write()
 open()        unlink()
                                   flush()
 create()
                                  release()
release()
                           storage
               seFS DB     Efficiency
seFS: Code
def getattr(self, path):
    sefs = seFS()
    stat = fuse.stat()                                   stat.stat_ino =
    context = fuse.FuseGetContext()                                  int(sefs.getinode(path))
    #Root
    if path == '/':                                       # Get the file size from DB
         stat.stat_nlink = 2                              if sefs.getlength(path) is not None:
         stat.stat_mode = stat.S_IFDIR | 0755                stat.stat_size =
    else:                                                            int(sefs.getlength(path))
         stat.stat_mode = stat.S_IFREG | 0777             else:
         stat.stat_nlink = 1                                  stat.stat_size = 0
         stat.stat_uid, stat.stat_gid =                   return stat
                       (context ['uid'], context   else:
['gid'])                                              return - errno.ENOENT

        # Search for this path in DB
        ret = sefs.search(path)
        # If file exists in DB, get its times
        if ret is True:
            tup = sefs.getutime(path)
            stat.stat_mtime =
              int(tup[0].strip().split('.')[0])
            stat.stat_ctime =
              int(tup[1].strip().split('.')[0])
            stat.stat_atime =
              int(tup[2].strip().split('.')[0])
seFS: Code…
 def create(self, path,flags=None,mode=None):
    sefs = seFS()
    ret = self.open(path, flags)
    if ret == -errno.ENOENT:
    #Create the file in database
         ret = sefs.open(path)
         t = int(time.time())
         mytime = (t, t, t)
         ret = sefs.utime(path, mytime)
         self.fd = len(sefs.ls())
         sefs.setinode(path, self.fd)
         return 0


def write(self, path, data, offset):
    length = len(data)
    sefs = seFS()
    ret = sefs.write(path, data)
    return length
seFS: Learning
• Design your file system and define the
  objectives first, before development
• Database schema is crucial
• skip implementing functionality your file
  system doesn’t intend to support
• Knowledge on FUSE API is essential
  – FUSE APIs are look-alike to standard POSIX APIs
  – Limited documentation of FUSE API
• Performance?
Conclusion
• Development of FS is very easy with FUSE
• Python aids RAD with Python-Fuse bindings
• seFS: Thought provoking implementation
• Creative applications – your needs and
  objectives
• When are you developing your own File
  system?! ☺
Further Read
• Sample Fuse based File systems
  – Sshfs
  – YoutubeFS
  – Dedupfs
  – GlusterFS
• Python-Fuse bindings
  – http://fuse.sourceforge.net/
• Linux internal manual
Contact Us
• Chetan Giridhar
  – http://technobeans.com
  – cjgiridhar@gmail.com
• Vishal Kanaujia
  – http://freethreads.wordpress.com
  – vishalkanaujia@gmail.com
Backup
Agenda
• The motivation
• Intro to *NIX File Systems
• Trade off: code in user and kernel space
• FUSE?
• Hold on – What’s VFS?
• Diving into FUSE internals
• Design and develop your own File System with
  Python-FUSE bindings
• Lessons learnt
• Python-FUSE: Creative applications/ Use-cases
User-space and Kernel space
• Kernel-space
   – Kernel code including device drivers
   – Kernel resources (hardware)
   – Privileged user
• User-space
   – User application runs
   – Libraries dealing with kernel space
   – System resources
FUSE: Internals
• Three major components:
  – Userspace library (libfuse.*)
  – Kernel module (fuse.ko)
  – Mount utility (fusermount)
• Kernel module hooks in to VFS
  – Provides a special device “/dev/fuse”
     • Can be accessed by a user-space process
     • Interface: user-space application and fuse kernel
       module
     • Read/ writes occur on a file descriptor of /dev/fuse
FUSE Workflow
                        Custatom File Systatem




   User (file I/O)         User-space FUSE lib

                                                 User space


                                                 Kernel space

Virtual File Systatem       FUSE: kernel lib
Facts and figures
• seFS – online storage efficiency
• De-duplication/ compression
   – Managed catalogue information (file meta-data rarely changes)
   – Compression encoded information
• Quick and easy prototyping (Proof of concept)
• Large dataset generation
   – Data generated on demand
Creative applications: FUSE based File
               systems
• SSHFS: Provides access to a remote file-system
  through SSH
• WikipediaFS: View and edit Wikipedia articles as
  if they were real files
• GlusterFS: Clustered Distributed Filesystem
  having capability to scale up to several petabytes.
• HDFS: FUSE bindings exist for the open
  source Hadoop distributed file system
• seFS: You know it already ☺
Fuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FS

More Related Content

PPT
Building File Systems with FUSE
elliando dias
 
PDF
Rhce syllabus
shushanto
 
PPTX
2015 bioinformatics bio_python
Prof. Wim Van Criekinge
 
PPT
eZ Publish cluster unleashed revisited
Bertrand Dunogier
 
PDF
TP2 Big Data HBase
Amal Abid
 
PDF
Linux Kernel - Virtual File System
Adrian Huang
 
PPTX
Hadoop
Mukesh kumar
 
PDF
Page cache in Linux kernel
Adrian Huang
 
Building File Systems with FUSE
elliando dias
 
Rhce syllabus
shushanto
 
2015 bioinformatics bio_python
Prof. Wim Van Criekinge
 
eZ Publish cluster unleashed revisited
Bertrand Dunogier
 
TP2 Big Data HBase
Amal Abid
 
Linux Kernel - Virtual File System
Adrian Huang
 
Hadoop
Mukesh kumar
 
Page cache in Linux kernel
Adrian Huang
 

What's hot (20)

PDF
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Adrian Huang
 
PDF
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
 
PDF
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
PPTX
2016 03 15_biological_databases_part4
Prof. Wim Van Criekinge
 
PDF
Decompressed vmlinux: linux kernel initialization from page table configurati...
Adrian Huang
 
PDF
Registry
messyclick
 
PDF
Javase7 1641812
Vinay H G
 
DOCX
Rhel 6.2 complete ebook
Yash Gulati
 
PDF
Linux basic
Pragyagupta37
 
DOC
Apache hadoop 2_installation
sushantbit04
 
PPT
Persistences
Training Guide
 
PDF
Linux command line cheatsheet
We Ihaveapc
 
PDF
Beginning hive and_apache_pig
Mohamed Ali Mahmoud khouder
 
PDF
Rhel 6.2 complete ebook
Yash Gulati
 
PPTX
Hadoop installation on windows
habeebulla g
 
PDF
Course 102: Lecture 5: File Handling Internals
Ahmed El-Arabawy
 
PDF
Android Data Persistence
Romain Rochegude
 
PDF
Linux cheat-sheet
Craig Cannon
 
PPSX
Linux configer
MD. AL AMIN
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Adrian Huang
 
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
2016 03 15_biological_databases_part4
Prof. Wim Van Criekinge
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Adrian Huang
 
Registry
messyclick
 
Javase7 1641812
Vinay H G
 
Rhel 6.2 complete ebook
Yash Gulati
 
Linux basic
Pragyagupta37
 
Apache hadoop 2_installation
sushantbit04
 
Persistences
Training Guide
 
Linux command line cheatsheet
We Ihaveapc
 
Beginning hive and_apache_pig
Mohamed Ali Mahmoud khouder
 
Rhel 6.2 complete ebook
Yash Gulati
 
Hadoop installation on windows
habeebulla g
 
Course 102: Lecture 5: File Handling Internals
Ahmed El-Arabawy
 
Android Data Persistence
Romain Rochegude
 
Linux cheat-sheet
Craig Cannon
 
Linux configer
MD. AL AMIN
 
Ad

Viewers also liked (7)

PDF
Bytecode Optimizations
ESUG
 
PDF
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
akaptur
 
PDF
Diving into byte code optimization in python
Chetan Giridhar
 
PPTX
Testers in product development code review phase
Chetan Giridhar
 
PDF
Design patterns in python v0.1
Chetan Giridhar
 
PPTX
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
PPTX
Rapid development & integration of real time communication in websites
Chetan Giridhar
 
Bytecode Optimizations
ESUG
 
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
akaptur
 
Diving into byte code optimization in python
Chetan Giridhar
 
Testers in product development code review phase
Chetan Giridhar
 
Design patterns in python v0.1
Chetan Giridhar
 
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
Rapid development & integration of real time communication in websites
Chetan Giridhar
 
Ad

Similar to Fuse'ing python for rapid development of storage efficient FS (20)

PDF
Fuse'ing python for rapid development of storage efficient
Vishal Kanaujia
 
PDF
Writing file system in CPython
delimitry
 
PDF
Python Fuse
Matteo Bertozzi
 
PDF
PythonFuse (PyCon4)
Matteo Bertozzi
 
PPT
Integrity and Security in Filesystems
Conferencias FIST
 
PDF
RaleighFS v5
Matteo Bertozzi
 
PDF
Writing flexible filesystems in FUSE-Python
Anurag Patel
 
PDF
Lect12
Vin Voro
 
PPTX
Fuse- Filesystem in User space
Danny Tseng
 
PDF
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
Thomas Uhl
 
PDF
Make Sure Your Applications Crash
Moshe Zadka
 
PDF
Lect08
Vin Voro
 
PPT
Unit 3 chapter 1-file management
Kalai Selvi
 
PPT
Chapter 8 distributed file systems
AbDul ThaYyal
 
PDF
FUSE Filesystems
elliando dias
 
PPTX
DFSNov1.pptx
EngrNabidRayhanKhale
 
ODP
Gluster technical overview
Gluster.org
 
PDF
An Introduction to User Space Filesystem Development
Matt Turner
 
PDF
10 File System
Dr. Loganathan R
 
Fuse'ing python for rapid development of storage efficient
Vishal Kanaujia
 
Writing file system in CPython
delimitry
 
Python Fuse
Matteo Bertozzi
 
PythonFuse (PyCon4)
Matteo Bertozzi
 
Integrity and Security in Filesystems
Conferencias FIST
 
RaleighFS v5
Matteo Bertozzi
 
Writing flexible filesystems in FUSE-Python
Anurag Patel
 
Lect12
Vin Voro
 
Fuse- Filesystem in User space
Danny Tseng
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
Thomas Uhl
 
Make Sure Your Applications Crash
Moshe Zadka
 
Lect08
Vin Voro
 
Unit 3 chapter 1-file management
Kalai Selvi
 
Chapter 8 distributed file systems
AbDul ThaYyal
 
FUSE Filesystems
elliando dias
 
DFSNov1.pptx
EngrNabidRayhanKhale
 
Gluster technical overview
Gluster.org
 
An Introduction to User Space Filesystem Development
Matt Turner
 
10 File System
Dr. Loganathan R
 

Recently uploaded (20)

PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
DevOps & Developer Experience Summer BBQ
AUGNYC
 
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
PDF
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
PDF
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
DOCX
Top AI API Alternatives to OpenAI: A Side-by-Side Breakdown
vilush
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
GYTPOL If You Give a Hacker a Host
linda296484
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Captain IT
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PPTX
C Programming Basics concept krnppt.pptx
Karan Prajapat
 
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
Doc9.....................................
SofiaCollazos
 
DevOps & Developer Experience Summer BBQ
AUGNYC
 
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
SMACT Works
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Top AI API Alternatives to OpenAI: A Side-by-Side Breakdown
vilush
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
GYTPOL If You Give a Hacker a Host
linda296484
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Captain IT
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
C Programming Basics concept krnppt.pptx
Karan Prajapat
 

Fuse'ing python for rapid development of storage efficient FS

  • 1. 'FUSE'ing Python for rapid development of storage efficient file-system PyCon APAC ‘12 Singapore, Jun 07-09, 2012 Chetan Giridhar, Vishal Kanaujia
  • 2. File Systems • Provides way to organize, store, retrieve and manage information • Abstraction layer • File system: User User space – Maps name to an object Kernel space – Objects to file contents File System • File system types (format): Hardware – Media based file systems (FAT, ext2) – Network file systems (NFS) – Special-purpose file systems (procfs) • Services – open(), read(), write(), close()…
  • 3. Virtual File-system • To support multiple FS in *NIX • VFS is an abstract layer in kernel • Decouples file system implementation from the interface (POSIX API) – Common API serving different file system types • Handles user calls related to file systems. – Implements generic FS actions – Directs request to specific code to handle the request • Associate (and disassociate) devices with instances of the appropriate file system.
  • 4. File System: VFS User Myapp.py: open(), read() glibc.so glibc.so: open(), read() System call interface System call: sys_open(), sys_read() VFS Kernel space VFS: vfs_open(), vfs_read() ext2 NFS procfs Block layer/ Device drivers/Hardware
  • 5. Developing FS in *NIX • In-kernel file-systems (traditionally) • It is a complex task • Understanding kernel libraries and modules • Development experience in kernel space • Managing disk i/o • Time consuming and tedious development – Frequent kernel panic – Higher testing efforts • Kernel bloating and side effects like security
  • 6. Solution: User space • In user space: – Shorter development cycle – Easy to update fixes, test and distribute – More flexibility • Programming tools, debuggers, and libraries as you have if you were developing standard *NIX applications • User-space file-systems – File systems become regular applications (as opposed to kernel extensions)
  • 7. FUSE (Filesystem in USErspace) • Implement a file system in user-space – no kernel code required! • Secure, non-privileged mounts • Useful to develop “virtual” file-systems – Allows you to imagine “anything” as a file ☺ – local disk, across the network, from memory, or any other combination • User operates on a mounted instance of FS: - Unix utilities - POSIX libraries
  • 8. FUSE: Diving deeper Myfile.py: open() CustomFS: open() glibc.so libfuse.so VFS: vfs_open() fuse.ko /dev/fuse
  • 9. FUSE | develop • Choices of development in C, C++, Java, … and of course Python! • Python interface for FUSE – (FusePython: most popularly used) • Open source project – http://fuse.sourceforge.net/ • For ubuntu systems: $sudo apt-get instatall python-fuse $mkdir ./mount_point $python myfuse.py ./mount_point $fusermount -u ./mount_point
  • 10. FUSE API Overview • File management – open(path) – create(path, mode) – read(path, length, offset) – write(path, data, offset) • Directory and file system management – unlink(path) – readdir(path) • Metadata operations – getattr(path) – chmod(path, mode) – chown(path, uid, gid)
  • 11. seFS – storage efficient FS • A prototype, experimental file system with: – Online data de-duplication (SHA1) – Compression (text based) • SQLite • Ubuntu 11.04, Python-Fuse Bindings • Provides following services: open() write() chmod() create() readdir() chown() read() unlink()
  • 12. seFS Architecture <<FUSE code>> Your myfuse.py application File System Operations <<SQLite DB Interface>> seFS.py storage Efficiency = De-duplication + <<pylibrary>> Compression SQLiteHandler.py Compression.py seFS DB Crypto.py
  • 13. seFS: Database CREATE TABLE metadata( CREATE TABLE data( "id" INTEGER, "id" INTEGER "abspath" TEXT, PRIMARY KEY "length" INTEGER, AUTOINCREMENT, "mtime" TEXT, seFS "sha" TEXT, "ctime" TEXT, "data" BLOB, "atime" TEXT, Schema "length" INTEGER, "inode" INTEGER); "compressed" BLOB); data table metadata table
  • 14. seFS API flow $touch abc $rm abc $cat >> abc User Operations getattr() getattr() getattr() open() create() access() seFS APIs write() open() unlink() flush() create() release() release() storage seFS DB Efficiency
  • 15. seFS: Code def getattr(self, path): sefs = seFS() stat = fuse.stat() stat.stat_ino = context = fuse.FuseGetContext() int(sefs.getinode(path)) #Root if path == '/': # Get the file size from DB stat.stat_nlink = 2 if sefs.getlength(path) is not None: stat.stat_mode = stat.S_IFDIR | 0755 stat.stat_size = else: int(sefs.getlength(path)) stat.stat_mode = stat.S_IFREG | 0777 else: stat.stat_nlink = 1 stat.stat_size = 0 stat.stat_uid, stat.stat_gid = return stat (context ['uid'], context else: ['gid']) return - errno.ENOENT # Search for this path in DB ret = sefs.search(path) # If file exists in DB, get its times if ret is True: tup = sefs.getutime(path) stat.stat_mtime = int(tup[0].strip().split('.')[0]) stat.stat_ctime = int(tup[1].strip().split('.')[0]) stat.stat_atime = int(tup[2].strip().split('.')[0])
  • 16. seFS: Code… def create(self, path,flags=None,mode=None): sefs = seFS() ret = self.open(path, flags) if ret == -errno.ENOENT: #Create the file in database ret = sefs.open(path) t = int(time.time()) mytime = (t, t, t) ret = sefs.utime(path, mytime) self.fd = len(sefs.ls()) sefs.setinode(path, self.fd) return 0 def write(self, path, data, offset): length = len(data) sefs = seFS() ret = sefs.write(path, data) return length
  • 17. seFS: Learning • Design your file system and define the objectives first, before development • Database schema is crucial • skip implementing functionality your file system doesn’t intend to support • Knowledge on FUSE API is essential – FUSE APIs are look-alike to standard POSIX APIs – Limited documentation of FUSE API • Performance?
  • 18. Conclusion • Development of FS is very easy with FUSE • Python aids RAD with Python-Fuse bindings • seFS: Thought provoking implementation • Creative applications – your needs and objectives • When are you developing your own File system?! ☺
  • 19. Further Read • Sample Fuse based File systems – Sshfs – YoutubeFS – Dedupfs – GlusterFS • Python-Fuse bindings – http://fuse.sourceforge.net/ • Linux internal manual
  • 20. Contact Us • Chetan Giridhar – http://technobeans.com – [email protected] • Vishal Kanaujia – http://freethreads.wordpress.com – [email protected]
  • 22. Agenda • The motivation • Intro to *NIX File Systems • Trade off: code in user and kernel space • FUSE? • Hold on – What’s VFS? • Diving into FUSE internals • Design and develop your own File System with Python-FUSE bindings • Lessons learnt • Python-FUSE: Creative applications/ Use-cases
  • 23. User-space and Kernel space • Kernel-space – Kernel code including device drivers – Kernel resources (hardware) – Privileged user • User-space – User application runs – Libraries dealing with kernel space – System resources
  • 24. FUSE: Internals • Three major components: – Userspace library (libfuse.*) – Kernel module (fuse.ko) – Mount utility (fusermount) • Kernel module hooks in to VFS – Provides a special device “/dev/fuse” • Can be accessed by a user-space process • Interface: user-space application and fuse kernel module • Read/ writes occur on a file descriptor of /dev/fuse
  • 25. FUSE Workflow Custatom File Systatem User (file I/O) User-space FUSE lib User space Kernel space Virtual File Systatem FUSE: kernel lib
  • 26. Facts and figures • seFS – online storage efficiency • De-duplication/ compression – Managed catalogue information (file meta-data rarely changes) – Compression encoded information • Quick and easy prototyping (Proof of concept) • Large dataset generation – Data generated on demand
  • 27. Creative applications: FUSE based File systems • SSHFS: Provides access to a remote file-system through SSH • WikipediaFS: View and edit Wikipedia articles as if they were real files • GlusterFS: Clustered Distributed Filesystem having capability to scale up to several petabytes. • HDFS: FUSE bindings exist for the open source Hadoop distributed file system • seFS: You know it already ☺