Various Filesystems in Hadoop Last Updated : 15 Jul, 2025 Comments Improve Suggest changes Like Article Like Report Hadoop is an open-source software framework written in Java along with some shell scripting and C code for performing computation over very large data. Hadoop is utilized for batch/offline processing over the network of so many machines forming a physical cluster. The framework works in such a manner that it is capable enough to provide distributed storage and processing over the same cluster. It is designed to work on cheaper systems commonly known as commodity hardware where each system offers its local storage and computation power. Hadoop is capable of running various file systems and HDFS is just one single implementation that out of all those file systems. The Hadoop has a variety of file systems that can be implemented concretely. The Java abstract class org.apache.hadoop.fs.FileSystem represents a file system in Hadoop. Filesystem URI scheme Java implementation (all under org.apache.hadoop) Description Localfilefs.LocalFileSystemThe Hadoop Local filesystem is used for a locally connected disk with client-side checksumming. The local filesystem uses RawLocalFileSystem with no checksums.HDFShdfshdfs.DistributedFileSystemHDFS stands for Hadoop Distributed File System and it is drafted for working with MapReduce efficiently. HFTPhftphdfs.HftpFileSystem The HFTP filesystem provides read-only access to HDFS over HTTP. There is no connection of HFTP with FTP. This filesystem is commonly used with distcp to share data between HDFS clusters possessing different versions. HSFTPhsftphdfs.HsftpFileSystemThe HSFTP filesystem provides read-only access to HDFS over HTTPS. This file system also does not have any connection with FTP.HARharfs.HarFileSystemThe HAR file system is mainly used to reduce the memory usage of NameNode by registering files in Hadoop HDFS. This file system is layered on some other file system for archiving purposes.KFS (Cloud-Store)kfsfs.kfs.KosmosFileSystemcloud store or KFS(KosmosFileSystem) is a file system that is written in c++. It is very much similar to a distributed file system like HDFS and GFS(Google File System).FTPftpfs.ftp.FTPFileSystemThe FTP filesystem is supported by the FTP server.S3 (native)s3nfs.s3native.NativeS3FileSystemThis file system is backed by AmazonS3.S3 (block-based)s3fs.s3.S3FileSystemS3 (block-based) file system which is supported by Amazon s3 stores files in blocks(similar to HDFS) just to overcome S3's file system 5 GB file size limit. Hadoop gives numerous interfaces to its various filesystems, and it for the most part utilizes the URI plan to pick the right filesystem example to speak with. You can use any of this filesystem for working with MapReduce while processing very large datasets but distributed file systems with data locality features are preferable like HDFS and KFS(KosmosFileSystem). Comment More infoAdvertise with us Next Article Hadoop Version 3.0 - What's New? D dikshantmalidev Follow Improve Article Tags : Data Engineering Similar Reads Hadoop - Cluster, Properties and its Types Before diving into Hadoop clusters, it's important to understand what a cluster is.A cluster is simply a group of interconnected computers (or nodes) that work together as a single system. These nodes are connected via a Local Area Network (LAN) and share resources and tasks to achieve a common goal 3 min read Hadoop - Pros and Cons Big Data has become necessary as industries are growing, the goal is to congregate information and finding hidden facts behind the data. Data defines how industries can improve their activity and affair. A large number of industries are revolving around the data, there is a large amount of data that 5 min read Hadoop Version 3.0 - What's New? Hadoop is a Java-based framework for distributed storage and processing of large datasets. Introduced in 2006 by Doug Cutting and Mike Cafarella for the Nutch project, it soon became central to Big Data technologies. By 2008, it outperformed supercomputers in sorting terabytes of data. With Hadoop 2 5 min read Hadoop - Daemons and Their Features In Hadoop, daemons are background Java processes that run continuously to manage storage, resource allocation, and task coordination across a distributed system. These daemons form the backbone of the Hadoop framework, enabling efficient data processing and fault tolerance at scale.Hadoop's architec 4 min read Introduction to Hadoop Hadoop is an open-source software framework that is used for storing and processing large amounts of data in a distributed computing environment. It is designed to handle big data and is based on the MapReduce programming model, which allows for the parallel processing of large datasets. Its framewo 3 min read Hadoop - Different Modes of Operation As we all know Hadoop is an open-source framework which is mainly used for storage purpose and maintaining and analyzing a large amount of data or datasets on the clusters of commodity hardware, which means it is actually a data management tool. Hadoop also posses a scale-out storage property, which 4 min read Like