Quickpeer: A P2P Framework For Distributed Application Development
Quickpeer: A P2P Framework For Distributed Application Development
ABSTRACT
Recently P2P computing has attracted lots of attention from both the distributed computing
community and millions and millions of Internet users. This paper describes a pure P2P file
sharing system, QuickPeer, which is originally designed to satisfy the requirement for a graduate
computer science course. Our QuickPeer system is not only a P2P application, but also a software
framework for distributed application development.
1. INTRODUCTION
Recently P2P computing has attracted lots of attention from both the distributed computing
community and millions and millions of Internet users. The emergence of the Peer-to-Peer concept
has dramatically changed the way that people used to share information. By contributing some
information or a small portion of computing power(or resources), P2P users can get tons of useful
and interesting resources from “unknown” owners almost “for free”. P2P computing provides a set
of fundamental Internet services[2], such as discovering, searching and sharing, and by doing that,
it gathers a huge amount of “tiny” resources or computing powers and builds up an “unlimited
large” information center or a “super computer”.
Lots of works have been done in the P2P file sharing research area. All existing systems can be
roughly categorized into two areas: centralized control and pure P2P(decentralized). The most
famous system in the first category is Napster, mainly an online music sharing system. However,
recently people spent more and more effort studying the systems belonging to the second category.
These include Gnutella, FreeNet, Chord, etc. The decentralized property provides them a good
chance of achieving greater scalability, which is often considered as a fundamental argument for
applying distributed systems instead of traditional client-server systems. However, decentralized
systems are much harder to build. The primary reason is the performance concern. Without the
help from a centralized monitor, it is hard to provide an efficient searching service and an
automatic load balance. Also, fault tolerance is a big issue because users in pure P2P group tend to
be much more “unreliable” than the centralized monitor(server) and they will join and leave the
group quite often. The problem becomes how to tolerate these “high frequency” failures and still
provide a satisfying service to rest of the users.
This paper describes a decentralized P2P file sharing system that we developed as a graduate
1
CPSC508 Operating Systems Course Project Report
course project. It is not a P2P application in the traditional manner, instead, it is more like a
software framework for distributed application development. The reason why it can be treated as a
P2P application is because we also customized this framework to be a file sharing system based on
Distributed Hashtable(DHT). Next section describes the software architecture for this system. The
DHT algorithm along with some modifications and optimizations are introduced in section 3.
Section 4 discusses some interesting properties of QuickPeer. Section 5 describes some of the
lessons we learned during the implementation.
2. SOFTWARE ARCHITECTURE
As we mentioned before, our system is more like a framework for distributed application
development. The architecture of this system is carefully designed so that it is ready to be
customized into a specific application domain or to be extended with a new service algorithm. The
only things a developer needs to do in order to accomplish these tasks, just as what are required
for extending a regular framework, is to design a application domain or a service algorithm and
realize it by subclassing some of the “ready to be subclassed” classes and implementing some of
the interfaces(in Java terms). This section introduces a rough idea about this design.
2.1 Overview
Figure 1(a) is a conceptual component diagram of our system and many other distributed systems.
The idea is to separate application layer from peer group with a service layer. Service layer defines
and maintains the architecture of peer group. This includes how to link nodes together, how to
maintain the group membership, etc.(Note that it is not restricted to P2P group form. Client-server
architecture can also be modelled in this way). Service layer also provides a service interface to
application layer, exposing all the available services. Typical services include upload, download,
login, join, search, discover and etc. The advantage of this design is that the existence of service
layer separates the design of application from the core(or kernel).
However, there is a problem about this architecture. That is, application layer and service layer are
tightly coupled, so is service layer and peer group. This tight coupling breaks our goal of
providing a highly flexible and reusable framework. It is always desirable to make as few changes
as possible. But it turns out that this problem can be minimized(or at least become less
problematic) by making two modifications. Figure 1(b) presents the real architecture of our final
system. These two changes are the deployment of a controller layer and the extension of service
layer to include some parts of the original group layer.
2
CPSC508 Operating Systems Course Project Report
Application Application(UI)
Service Controller
Communication
Network
(a) (b)
In our system, service layer takes part of the responsibilities which originally belong to group
layer. Note that in the conceptual design architecture, see figure 1(a), group layer is actually
consisted by two sub-layers. They are logical communication and formation sub-layer(CFSL) and
node communication sub-layer(NCSL). CFSL tends to group the underlying nodes into a certain
logic formation and describe how they should interact. It is often highly depending on the service
algorithm and therefore the deployment of a new service algorithm usually causes a reconstruction
of this sub-layer. NCSL is the one interacts with the underlying communication layer and models
the real coordination behaviour. So we decided to push the functionalities of CFSL into service
layer. That complicates the design of service layer, but since a new CFSL should be designed
anyway, we claim that it will not complicate the overall design. But by doing that, the NCSL can
remain untouched, which simplifies the overall design and decreases the number of changes.
Instead of letting service layer to take the full control of the underlying layers, we decide to give
controller layer some direct controls over group architecture layer. The reason is also to guarantee
and improve the flexibility and reusability. Conceptually a “pluggable” service layer should only
consider manipulating the coordination of nodes and maintaining its group architecture. The
details about how to maintain the behaviour of a certain node running on a PC could possibly be
handled in the controller. By saying that, we do not mean that service layer actually loses some
controls in our framework. Instead, our open architecture makes it so easy to regain those controls
by subclassing the controller and blocks its interaction with the underlying layers.
3
CPSC508 Operating Systems Course Project Report
Kernel component: This component defines the notion of nodes and peer group. These definitions
are somehow abstract and subclassing is needed to define the way to link nodes together to build a
peer group. Note that this component gives a detailed enough implementation for NCSL.
Service component: This component defines the behaviour of the whole system. It not only
determines what a system can do and how different systems should be combined together, but also
somehow indirectly constrains the valid and reasonable application domains. The choice of P2P
with centralized control, pure P2P with DHT or client-server model will generate huge impacts on
other parts of the systems. One important thing is that service component should implement
“Strategy” interface in order to satisfy the strategy pattern requirement.
3. SERVICE ALGORITHM
For the purpose of making a “workable” software and demonstrating the powerfulness and
flexibility of our framework, we also implemented a service algorithm based on DHT engine. We
borrowed a rough idea from CHORD[3] and made some modifications and optimizations. Note
that our implementation of CHORD DHT engine might look totally different from the actual
CHORD version. The reason is we only adopted its idea of maintaining group membership and
4
CPSC508 Operating Systems Course Project Report
allocating resources, but not anything from its implementation. We claim our system has nothing
to do with the existing CHORD applications except the rough ideas about DHT maintenance.
participant nodes into a ring, we will call it C-Ring(Chord-Ring). Figure 2 shows the graphical
representation of this C-Ring. Each participant of the group is assigned an ID. In our algorithm,
this ID is generated by computing the SHA-1 hashing of a node’s original ID, which is a string
consisting of this node’s IP address, service port and node’s name. The computed hashing is 160
bits long, which in theory can support 2160 nodes. So we feel it is safe to argue that a single node
ID is globally unique with extremely high probability. The order of each node in this ring depends
on the numerical value of its node ID. The reason for using SHA-1 hashing is discussed in section
4.
Each node in the ring should keep the information(i.e. IP address, name, port, etc.) about its
immediate successor. And as long as it always has the correct information about its immediate
successor node, the correctness of this DHT algorithm(about searching and retrieving resources) is
guaranteed. [3] gives a detailed analysis about this issue. But participant in pure P2P system tends
to be unreliable and can fail, join or leave frequently, therefore, we use a successor list, which
holds N immediate successor along the ring, to handle this issue. Again, section 4 gives a detailed
discussion about the adoption of this strategy. In order to improve the stability of the system, we
also assign each node a predecessor. The existence of this predecessor concept can facilitate the
steps of handling a join action or a partial failure.
5
CPSC508 Operating Systems Course Project Report
The basic idea about ID position is to find the immediate predecessor and successor of this ID. If
it is the case when a new node joins the group, then it is plugged between this predecessor and
successor. If it is when uploading or downloading a resource, then the system will copy the
resource to or from the successor of this ID.
The naïve position algorithm is to follow the ring itself. In other words, current node asks its
immediate successor to do the search, and then follows this pattern until either it is found or failed.
However, the performance is still a problem if we only maintain a single successor since the ring
can become so large that a linear(O(n)) search is unrealistic. What CHORD does is to use an
auxiliary data structure, finger list, to improve the average case searching time complexity to be
no more than O(log2(n)). We adopted this idea and applied to our system. There are still some
problems about this approach and will be discussed in next section.
Second, we do not want to keep the redundant copies of resources on some nodes’ hard disks and
thereby wasting spaces. Here “redundant” means the copies that are neither master copy nor valid
replica. A garbage collection routine is periodically invoked to check the validity of all the
resources. If a certain piece of resource replica does not get claimed for quite a while, it will check
whether it is redundant or not. If it is , then it will be removed from the disk and the local resource
database.
4. PROPERTIES
Since our system consists of two parts, the framework part and the service algorithm part, the
discussion of the properties also needs to be separated. This paper only focuses on the properties
6
CPSC508 Operating Systems Course Project Report
4.1 Scalability
Scalability is one of the fundamental arguments that P2P advocators use to fight against regular
client-server architecture. Especially in pure(decentralized) P2P systems, there is no obvious bottle
neck. In other words, a pure P2P system should express great scalability. However, this is not the
case for some existing systems. For example, Gnutella works fine when the group contains
thousands of nodes, but their searching scheme results in the flooding-like communication among
nodes, which prevents Gnutella from being able to handle millions of nodes. When the group gets
larger and larger, the efficiency of the searching and positioning algorithm becomes extremely
important. More precisely speaking, scalability should be evaluated along with performance.
Therefore, any linear or flooding algorithms cannot guarantee great scalability.
However, the adoption of DHT algorithm ensures the scalability of our system. We claim that our
system can handle millions of nodes with relatively reasonable performance. In particular, it is the
use of finger list, which has 160 entries in it, improves the scalability. Theoretically, the entries in
finger list should represent some nodes that are scattering along the ring. And by communicating
with these nodes directly, the system can take much shorter path to locate the real successor of a
piece of resource. In terms of search, it provides a O(log2(n)) complexity with very high
probability. [3] gives a proof for this argument. However, we should note that by using this
strategy, being able to find a resource will take much less time than failing to find that resource.
The reason for this is that there is no way to claim that a particular entry in finger list actually
stands for the node that it is supposed to represent. Finger list is not as important as immediate
successor. The wrong information in finger list affects only the performance, but not the
correctness. Therefore entries in finger list are updated much less frequently than immediate
successor and entries in successor list. As a failure search concerned, it has to check each entry,
starting from the bottom of this list, until it can make sure the resource does not exist.
7
CPSC508 Operating Systems Course Project Report
immediate successor, it will replace it with the next immediate successor in the successor list. It
keeps on doing this until it finds a living successor or it notices all the nodes in this list are gone,
and in this case, disaster will happen. The ring is going to break, piece by piece. We believe that
there is no way to recover from this disaster except receiving some external helps. But as long as
the successor list is long enough, the SHA-1 scheme ensures that it is fairly unlikely to happen.
4.2.3 Problems
There is no perfect thing in this world, so is our system. There is no way to argue that it can handle
network partition or the concurrent failure of all the nodes in some successor lists.
5. IMPLEMENTATION ISSUES
We would like to use this section to talk about the things we learned during the design and
implementation. First we have to say that this system is originally designed for a graduate
computer science course, but very soon we realized that it would go beyond that. It is not only
because its 11,000+ lines of non-comment code, but also because it involves two separate
concepts: a good framework design and a scalable DHT based distributed system. We believe that
each of them can make a complicated enough project. Our design and implementation can be
divided into 8 stages, and here we will talk about what problems we encountered, how we solved
them and what we learned from it for each of these stages.
8
CPSC508 Operating Systems Course Project Report
designing the overall architecture and rough execution models. Things were working quite well at
that time. We decided to provide a framework for distributed application development and to give
a simple implementation of a pure P2P system by applying this framework. The candidates are
CHORD and FreeNet. But we knew that this choice should not affect our design of the framework
part. In order to provide great reusability, we referred to [1] and decided to use strategy pattern. At
the end of this stage, we got a clear view of the whole system and the design is detailed enough for
us to start implementation.
5.5 Redesign
We did not touch this work for several days. Then when we came back, we realized that most of
the problems were due to our misunderstanding the notion of remote nodes. Also, we did not
really consider and solve the consistency problems yet. We realized that an overall redesign and
reconstruction was necessary. We went back to our original design, redefined the remote node, and
carefully sought critical sections. Then we changed our implementation quite a lot. Things seemed
to walk to the positive direction. Nodes now could communicate, resources could be sent and
replicated. But once a while some weird behaviour would happen.
9
CPSC508 Operating Systems Course Project Report
5.6 Fine-tune
Then we started to do a much more careful testing and debugging. The problems we noticed at this
stage were: Resource would disappear after a while or after a node failure. Ring would break if
nodes joined and left the group in some patterns. Most important of all, our system was not
scalable at all. The speed became intolerable if we ran three instances(Note that at this stage we
have not implemented the GUI yet). We had to check and trace the DHT algorithm very carefully,
catching any unsound mapping from algorithm itself to the implementation. The ways of handling
critical sections were also problematic. We seemed to forget to handle dead-lock.
6. CONCLUSION
Here we list several lessons we learned after this project:
z Testing and debugging are quite hard in the domain of distributed applications. Multiple
threading, multiple processes and network communication make it almost impossible to do a
relatively complete testing.
10
CPSC508 Operating Systems Course Project Report
z The development task is short of tool support. Regular debugging tool is not helpful in this
case. Distributed application oriented programming languages are not well developed.
z A careful and detailed design can make things better and easier to control.
z Separation of concerns is very important for the goal of decreasing system complexity.
7. FUTURE WORK
This work is far from complete and sophisticated. As the framework domain is concerned, other
experimental applications should be implemented by using this framework in order to verify its
reusability and evolvability. DHT algorithm part requires more detailed testing. Also, in order to
create a useful software, more features need to be plugged in. For example, more services and
more sophisticated GUI.
8. REFERENCES
[1] Erich Gamma, et al. Design Patterns: Elements of Reusable Object-Oriented Software,
Addison-Wesley, 1995
[2] Project JXTA: An Open, Innovative Collaboration. http://www.jxta.org/project/www/docs/
OpenInnovative.pdf
[3] Stoica, Ion., Morris, Robert., Karger, David., Kaashoek, M. Frans., Balakrishnan, Hari. Chord:
A Scalable Peer-to-Peer Lookup Service for Internet Applications. In SIGCOMM, 2001.
[4] JAXB: Java Architecture for XML Binding.
http://developer.java.sun.com/developer/technicalArticles/WebServices/jaxb/
11