A Novel Distributed File System Using Blockchain Metadata
A Novel Distributed File System Using Blockchain Metadata
https://doi.org/10.1007/s11277-022-10108-2
Abstract
Cluster computing has become an inevitable part of data processing as the huge volume
of data being produced from different sources like online social media, IoT, mobiledata,
sensor data, black box data and so on increases in an exponentially fast manner. Distrib-
uted File System defines different methods to distribute, read and eliminate the files among
different cluster computing nodes. It is found that popular distributed file systems such as
Google File System and Hadoop Distributed File System store metadata centrally. This cre-
ates a chance for a Single Point of Failure that arises the need for backup and alternative
solutions to recover the metadata on the failure of the metadata server. Also, the name
node server is built using expensive and reliable hardware. For small and medium clusters,
it is not cost effective to maintain expensive name node server. Even though cheap com-
modity hardware may substitute the name node functionality, they are prone to hardware
failure. This paper proposes a novel distributed file system to distribute files over a cluster
of machines connected in a Peer-to-Peer network. The most significant feature of the file
system is its capability to distribute the metadata using distributed consensus, using hash
values. Although the distributed metadata is visible to the public, the methodology ensures
that it is immutable and irrefutable. As part of the in-depth research, the proposed file
system has been successfully tested in the Google Cloud Platform. Also, the basic opera-
tions like read, write, and delete on Distributed File System with distributed metadata are
compared with that of Hadoop Distributed File System based on distribution time on the
same cluster setup. The novel distributed file system provides better results compared to
the existing methodologies.
Keywords Cluster computing · Google file system · Peer-to-Peer network · HDFS · Single
point of failure · Meta data
* Deepa S. Kumar
[email protected]
1
College of Engineering Munnar, Idukki, India
2
Centre for Development of Advanced Computing, Thiruvananthapuram, India
3
LBS Institute of Technology for Women, Thiruvananthapuram, India
4
LBS Centre for Science and Technology, Thiruvananthapuram, India
5
Amal Jyothi College of Engineering, Kanjirappally, Kottayam, India
13
Vol.:(0123456789)
D. S. Kumar et al.
1 Introduction
Distributed File System (DFS) is a set of services on a collection of nodes with the capabil-
ity to distribute file contents with the properties of location transparency and availability. The
other features include replica management, fault tolerance, data rebuild, error detection and
correction etc. The core of DFS lies with the metadata and its management.
The first-generation distributed file systems like Network File System (1974–95) were only
network storage file systems. Later with the development of distributed object systems, per-
sistent stored datasets with a visible namespace and sharing of data between users with access
mechanisms like read-only access, concurrent access ansd access control, mounting files etc.
were introduced [1].
Later, new architectures were being proposed and implemented with high-speed networks,
and distributed data among different servers, and then P2P architecture was proposed where
the service was distributed at the level of individual files. Initial file system designs satisfied
transparency, heterogeneity, efficiency and fault tolerance, and a limited achievement of con-
currency, replication, consistency, and security was achieved.
The evolvement of Big Data initialy lead to the Hadoop ecosystem with two main compo-
nents: one storage component, known as Hadoop Distributed File System (HDFS) wherein the
data are distributed, and the second one is the processing component known as Map Reduce
which is the most popular processing framework. HDFS architecture follows client server
architecture with name node as the server and data nodes as slaves [2, 3]. The name node
is designed with expensive and reliable hardware as it suffers from a single point of failure,
which keeps all the metadata regarding the huge collection of data stored on cheap commodity
hardware, data nodes.
The main motivation to decouple metadata and data was for better performance, especially
on large sized clusters like Yahoo. When the entire metadata is being kept in the RAM of
a central server, its processing becomes fast and simultaneously data nodes handle multiple
read and write operations. But the horizontal scalability was limited by the RAM capacity
of the server [4]. Hence several proposals and implementations on distributed metadata and
partitioned metadata servers had been evolved and are developing. Giraffa File System is one
of the latest projects carrying out based on dynamically partitioned metadata on a cluster of
metadata servers [5].
When managing relatively low sized clusters, the need for maintaining an expensive com-
ponent for keeping metadata is another major aspect that is being pointed out in this paper.
HDFS design mainly focused on overall system throughput rather than individual operational
latency [4].
This paper proposes a new file system without central name node for metadata storage and
processing. This paper outlines the implementation details of the proposed DFS basic opera-
tions such as read, write, delete, metadata creation and metadata distribution. The proposed
DFS operation’s read and write performance is compared with the traditional HDFS’ read and
write performance. The main feature of the proposed system lies with the distribution of meta-
data in an immutable manner and instead of keeping it centrally as in HDFS. The software
architectural components are also explored in detail.
13
A Novel Distributed File System Using Blockchain Metadata
An existing system study of the various data parallel storage architectures conducted on
various file architectures—Google File System, Hadoop Distributed File System, Cas-
sandra File System, Ignite File System and Giraffa File System are explored and detailed
below.
GFS was implemented for storage on a cluster with the capability to store big files, devel-
oped by Google, and enhanced for the communication between system-to-system rather
than user-to-system. GFS was developed as a scalable distributed file system for large, dis-
tributed data-intensive applications [6]. It provides fault tolerance and run-on commodity
hardware. It is optimized for a write-once-read-many access structure. GFS exhibits the
scalability, reliability, and availability and better performance. Handling component fail-
ures and a huge volume of data are the challenges involved in GFS [7]. It provides a file
system interface for all the operations on files.
Figure 1 indicates the various architectural components of GFS. GFS was mainly
intended for Google’s core data storage in search engines. The GFS cluster is organized
as a master node, which keeps track of the metadata information and a collection of chunk
servers.The files are partitioned into 64 MB chunks as the default size and each chunk is
assigned a unique 64-bit chunk handle by the master node for mapping the data chunks.
Reliability is ensured by replicating each chunk on multiple chunk servers. The master
maintains all file system metadata. All metadata are kept in the master memory and hence
master operations are faster. As a backup mechanism in case of a master node failure, the
first two types of files are kept in an operation log stored on the master’s local disk and
replicated on remote machines [6]. The chunk location information is known to the chunk
server and asks by the master during startup and whenever a chunk server joins the cluster.
The permissible operations are read and appended. Data replication is handled by the sys-
tem automatically, by maintaining at least three copies. GFS achieves a comparable read-
ing performance but is relatively slow for writing data to files due to the verification proce-
dure in the modifying chunk master (shadow master).
Fig. 1 GFS Architecture
13
D. S. Kumar et al.
HDFS, an advanced implementation of GFS, identified the main issues of data locality and
data replication [8]. HDFS is the most commonly used storage that relies on Map Reduce
programming (HDFS/Map Reduce). HDFS is organized as master/slave architecture with
a central server, name node which keeps the metadata, secondary name node as the check
pointing node and a cluster of commodity hardware known as data nodes which stores the
data. In the HDFS architecture, the name node and secondary name node are expensive
to maintain. In earlier versions of Hadoop, there are two daemons in name node and data
nodes namely job tracker and task tracker respectively, for resource allocation and task pro-
cessing. Now the Hadoop 2.x version introduces the component known as YARN which
acts as a resource manager and substitutes the job of job tracker daemon in name node
and the job of task tracker in data nodes. The metadata having details like HDFS location,
filename, replicas, path etc. are kept in the name node. The primary issues of large-scale
data management like fault tolerance, scalability and reliability are handled well. Figure 2
depicts the HDFS architecture.
The major drawback of HDFS architecture is that the existence of a central name node
leads to single point failure and for extremely large clusters, the single name node archi-
tecture limits the scalability of cluster size. Another issue is that the entire workflow in
HDFS architecture is based on the metadata, which is not secured enough. The privacy
and anonymity features are not being addressed in the existing HDFS architecture. HDFS
mainly focuses on overall performance for large sized clusters. Latency and expenses for
maintaining the cluster main servers like name node and secondary name node appear high
for relatively small sized clusters.
A unique in Memory Computing distributed file system for improving the processing
performance of big data is by the component of Hadoop, known as Apache Ignite [9].
It is an Application Programming Interface that is placed on the top of HDFS, which
can be plugged into Apache Hadoop or Spark as shown in Fig. 3. IGFS does not need
a name node. It automatically determines file data locality using a hashing function.
IGFS architecture eliminates the overhead associated with job tracker and task trackers
in HDFS architecture thereby providing low-latency and high-performance distributed
Fig. 2 HDFS Architecture (*
https://hadoop.apache.org/docs/
r1.2.1/hdfs_design.html)
13
A Novel Distributed File System Using Blockchain Metadata
processing. The Ignite nodes in IGFS are relatively expensive, and is not suitable for
data intensive processing.
CFS suggests better storage than HDFS by eliminating the name node, secondary name
node, and the job tracker. CFS is represented as key space having two column families
in Cassandra. Like HDFS, it provides replica management, and the corresponding set-
tings are done on the keyspace. The two column families represent the inode column
family that tracks each file’s metadata and block locations, and the file blocks are stored
in the sblock column family. CFS architecture is both faults tolerant and scalable. The
metadata is stored in the inodes.
CFS is organized as a serverless network instead of Master/Slave configuration in
HDFS. Gossip Protocol is used for communicating among peers. When a write request
arises to the analytic node, as depicted in Fig. 4, CFS writes the metadata information
in a table called inode. The blocks are created with ID numbers for each subblock cre-
ated and written to the Cassandra. For handling the read request, the metadata table
information is being utilized for the selection of blocks which introduces delays in disk
I/O operations. Also, the metadata searching time from the NoSQL database will incur
additional execution time [10].
13
D. S. Kumar et al.
HDFS keeps all the metadata and the entire namespace in the RAM of a single name node
and hence the growth of the files is limited to 20 PB [12]. Adding more nodes by the way
of dynamic partitioning of namespace leads to the development of the new distributed file
system called Giraffa File System. It is a highly available distributed file system that uti-
lizes the features of HDFS and HBASE. Ceph [13], Lustre [14], CassandraFS [10] etc.
are the other distributed file systems with distributed namespace. GiFS is designed with
minimal changes in the existing components of HDFS and HBASE, and the project is in
the experimental stage. Giraffa requirements include metadata scalability, load balancing
and speed. The GiFS was implemented with the Giraffa clients who read the metadata from
HBASE database whereby exchanges file to the data nodes (Fig. 5).
Distributed block management module in GiraffaFS supports block managers which
maintain flat namespace of blocks and manages block allocation, replication, removal, data
node management and the storage of namespace metadata in HBASE.
The architectural diagram shows the various functionalities of the application as follows.
The Giraffa system is in an experimental stage, and it suits large-sized clusters which
include thousands of nodes. Instead of keeping the metadata centrally, it is partitioned and
stored across multiple metadata servers dynamically and the file system handles load bal-
ancing too.
The software offers various file services such as storage/extraction and deletion of data
across the cluster. Most important distributed file systems are Google File System [GFS]
[6], Cassandra File System [CFS][10], Ignite File System [IGFS][9], Hadoop Distributed
File System [HDFS][11] and Giraffa File System [GiFS][5]. This paper proposes a Dis-
tributed File System with Distributed Metadata (DFS-DM) that exhibits the features such
13
A Novel Distributed File System Using Blockchain Metadata
as data recovery through replica management, error detection through Cyclic Redundancy
Check (CRC) and block rebuild using parity block addition.
This software manages the storage of several nodes, which are connected by the P2P net-
work. The software offers a file system interface for the DFSClients.
3.1.1 Components
The major software components of DFS-DM are DFSClient, DFSAdmin and the DFS
module as shown in Fig. 6. DFSClient issues read/write/delete request to the DFSAdmin.
DFSAdmin responds with the rObj/wObj to the DFSClient. Then the DFSClient interacts
with the DFS module using the read/write object given by the DFSAdmin. If the DFSCli-
ent issues a delete request, the DFSAdmin directly interacts with the DFS module. Delete-
Handler of the DFS module offers a delObj with the details of the file to be deleted and acts
as an interface to the cluster, where the actual data resides.
The DFS module includes the following important functional blocks.
3.1.1.1 BlockGenerator This software module reads the input file submitted by the DFS-
Client and splits it into blocks of 128 MB. The blocks are stored as separate files of size
128 MB, except for the last block.
3.1.1.2 ParityGenerator ParityGenerator module creates parity for the consecutive 10 data
blocks and is stored as a separate file, along with data blocks. Parity generation is based on
XOR operation on a bunch of 10 consecutive blocks of size 128 MB and hence the length of
parity will also hold the same size.
13
D. S. Kumar et al.
3.1.1.3 IPPieceMapper & Replica Manager This software module prepares the list of IP
addresses for the available data nodes. This module has incorporated a scheduling policy
as a mapping function from the list of available IPs to the blocks generated by the Block-
Generator.
IPPieceMapper maps the data blocks (primary & secondary copies) to the available list
of IPs. The Replica Manager manages the number of block replicas to be maintained; by
default, two copies of each block known as primary copy and secondary copy.
13
A Novel Distributed File System Using Blockchain Metadata
expense of three nodes. But in the schedule mentioned in Table 1, only one node suffers
to recover all replicas of failed node.
Mblks- Metadata
blocks [Mblk1-Mblk5]
[F1-F5]- Files
H1- Current hash
Fig. 7 Metadata blocks
13
D. S. Kumar et al.
3.1.1.7 FileWriter FileWriter acts as a file controller which accepts the wObj (issued by
DFSAdmin) from the DFSClient. FileWriter accepts data from the DFSClient and the
BlockGenerator computes the required number of files to store all the data blocks and
store the data in a block queue until the last block of the file has been read. For better
performance, data are being written on n + 1 disks as separate files, with n disks for stor-
ing the data simultaneously and 1 disk for parity storage. The parity blocks are generated
for 10 consecutive data blocks.
The parity blocks are stored among n + 1 disks in a distributive manner. For each
block of data received, a checksum is being computed by the CRC module. Hence, the
FileWriter converts the file into n block files + 1 parity file + 1 checksum file. Errors
encountered, if any, could be verified during reading process by the checksum file and
the particular block can be rebuilt using the parity file based on XOR function calcula-
tion. FileWriter then fetches the list of IPs of data nodes available on the cluster from
IPPieceMapper and finally, all the three files corresponding to the original file are writ-
ten to data nodes. The files are replicated as a factor of 2, with the primary copy and
the secondary copy. The distribution of the two copies either follows as described in
Table 1 or in Table 2.
3.1.1.8 FileReader FileReader is another file controller to assist the DFSClient during a
read operation, error free. As the DFSClient receives the rObj from DFSAdmin, it passes the
rObj to the FileReader. FileReader then fetches the metadata from the respective data node.
After getting the metadata from the blockchain, the block is validated by the time stamp field
of the blockchain, and the FileReader reads the data blocks and place them on the data queue
and verifies using the checksum file. In case of errors, the corrupted block can be rebuilt
using the information from the parity file. If the data nodes on the cluster fail to retrieve the
data blocks, the FileReader issues requests to read the data blocks from the secondary copy.
3.1.1.9 DeleteHandler As soon as the DFSClient issues the delReq to the DFSAdmin,
the admin passes the request to the DFS Module. DeleteHandler of the DFS Module
creates a delObj and passes this object to the cluster nodes. The data node on the cluster
verifies the availability using metadata. If the blocks are found, then the blocks will be
marked for deletion by setting an invalid time stamp and the information is propagated
on the P2P network. The DFS Module communicates with the DFSAdmin and acknowl-
edges the admin that the deletion is successful and finally DFSAdmin returns delete suc-
13
A Novel Distributed File System Using Blockchain Metadata
Fig. 9 Functional components of
DFS-Module
cessful message to the DFSClient. If the blocks are found missing, DFS Module updates
the admin with an unsuccessful message back to the DFSClient.
The block diagram of the DFS module illustrating different functionalities is shown
in Fig. 9.
The basic operations of DFS-DM are read, write, and delete files from / to the Distrib-
uted File System and the detailed workflow for the respective operations are depicted in
Figs. 10, 11, and 12 respectively.
13
D. S. Kumar et al.
13
A Novel Distributed File System Using Blockchain Metadata
13
D. S. Kumar et al.
HDFS, by default, maintains two replicas and addresses two node failures. The stor-
age space needed by HDFS is three times the original data, but in DFS-DM, the space
required is only two times and negligible space for parity and checksum file storage.
5 Results and Discussion
All the software components of the DFS-DM are implemented and tested as described
in the previous sections. Also, performance testing is done by setting up a local clus-
ter of four PCs with configuration as i5 core, 4 GB RAM, 500 GB HDD, networking
through D-LINK switch (10/100Mbps). Then, the time taken for both the distribution of
different datasets (PUT time) on the cluster and collecting the datasets (GET time) from
the cluster are measured. The PUT/GET time are also measured in big data Hadoop
cluster file system called HDFS with one name node and three data nodes.
13
A Novel Distributed File System Using Blockchain Metadata
The performance of the proposed system is evaluated by measuring the time to AddFile to
the cluster and GetFile from the cluster and the results are compared with HDFS. DFS-DM
shows almost similar PUT/GET times measured in HDFS as shown in Fig. 13.
Results prove that all the two existing distributed file systems and the proposed file sys-
tems have approximately the same time to complete the write and read data on the same
cluster setup. The existing HDFS has to maintain expensive components like name node,
secondary name node, stand-by passive name node and name node clusters on different
configurations. However, the proposed system shows a similar performance with P2P net-
work configuration and hence there is no need for expensive servers to keep the metadata
management. Hence the expensive hardware in the existing systems can be replaced with
the software blockchain metadata creation and distribution technology. The metadata main-
tained on the blockchain ensures immutability & anonymity by its design. The validation
of metadata can be checked through the time stamp available on each block of data.
Fig. 13 Comparison of distrib-
ute/collect time in HDFS vs
DFS-DM
13
D. S. Kumar et al.
Fig. 14 Comparison of execution
time on HDFS/Spark vs DFS-
DM/MPI for sentiment analysis
Sentiment Analysis is the process of recognizing the opinions from the sentences. The
sentiments can be positive, negative, or neutral [16]. The sentiment analysis application
is written in Scala programming which is tested for different file sizes and evaluates the
execution time in both Spark and MPI frameworks as shown in Table 3. Performance com-
parison is done with HDFS/Spark and the proposed DFS-DM in MPI framework and the
results show that DFS-DM/MPI outperforms approximately one order of magnitude than
HDFS/Spark as shown in Fig. 14.
7 Conclusion
13
A Novel Distributed File System Using Blockchain Metadata
C + + for the implementation in MPI. The execution time is analysed on spark and MPI.
The detailed quantitative analysis on Apache Spark vs MPI is carried out on a dataset of
upto 1 TB [17] and results show that the execution speed of MPI is roughly 1.5 times faster
than spark processing. Hence, the proposed Blockchain-based Distributed File System
library can be integrated into any big data and HPC applications under low-cost hardware
setup and without compromising the read and write performance.
The proposed decentralized infrastructure can be implemented using Inter Planetary File
System (IPFS) [18] as a future work. The IPFS is a protocol and peer-to-peer network for
storing and sharing data in a distributed file system. But the IPFS cannot store large files.
It uses content-addressing to uniquely identify each file in a global name space connecting
all computing devices. Files are stored inside IPFS object which is up to 256 kb in size.
IPFS objects can also contain a link to other IPFS objects. The major challenge during the
implementation of IPFS is that, the files that are larger than 256 kb, an image or video, are
split up into multiple IPFS objects, and the system will create an empty IPFS object that
links to all other pieces of the file. Each object gets hashed and is given a unique content
identifier, which serves as a fingerprint. This makes it faster and easier to store the small
pieces of data on the network quickly. But if the empty IPFS object goes offline, the file
access will be difficult. Hence, the replication of metadata in IPFS is needed which leads to
an increase in cost of operating the blockchain.
Funding No funding.
Declarations
Conflict of interest No conflicts of interest to disclose.
Human and Animal Rights Humans and animals are not involved in this research work.
References
1. Li, X. S., et al. (2011). Analysis and simplification of three-dimensional space vector PWM for three-
phase four-leg inverters. IEEE Transactions on Industrial Electronics, 58, 450–464.
2. https://www.slideshare.net/wahabtl/chapter-8-distributed-file-systems.
3. Shvachko, K., et al. (2010). The hadoop distributed file system. Yahoo! Sunnyvale, California USA.
4. White, T. (2009). Hadoop: The definitive guide. O’Reilly Media, Yahoo! Press, 2009
13
D. S. Kumar et al.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
13
A Novel Distributed File System Using Blockchain Metadata
13
D. S. Kumar et al.
13