0% found this document useful (0 votes)
3 views8 pages

Tanimura 2014

The paper discusses the development of PapioS3, a QoS-enabled object store based on Amazon S3, designed to address performance instability during concurrent access in cloud environments. It introduces a performance reservation system allowing users to explicitly or implicitly request guaranteed I/O throughput, enhancing the reliability of data storage and retrieval. The evaluation of PapioS3 demonstrates its effectiveness in providing stable performance compared to traditional load balancing methods in existing object stores.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views8 pages

Tanimura 2014

The paper discusses the development of PapioS3, a QoS-enabled object store based on Amazon S3, designed to address performance instability during concurrent access in cloud environments. It introduces a performance reservation system allowing users to explicitly or implicitly request guaranteed I/O throughput, enhancing the reliability of data storage and retrieval. The evaluation of PapioS3 demonstrates its effectiveness in providing stable performance compared to traditional load balancing methods in existing object stores.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

A High Performance, QoS-enabled, S3-based


Object Store

Yusuke Tanimura, Seiya Yanagita, and Takahiro Hamanishi


National Institute of Advanced Industrial Science and Technology (AIST)
Tsukuba, Ibaraki, 305–8568, JAPAN
Email: {yusuke.tanimura, s.yanagita, t-hamanishi}@aist.go.jp

Abstract—A scale-out and reliable object store is an important


building block of a cloud service, for storing virtual machine
images, backups and large application data. As such an object
store, Amazon S3 is available for Amazon EC2 users and other
S3-compatible storage systems are also used in private clouds.
However, there is concern about performance instability when
many applications concurrently access the storage service, due to
the characteristics of shared use.
This paper presents an approach to introducing a QoS-
enabled function into the S3-based object store. The object
store accepts an explicit performance request as an advanced
reservation, and enables QoS in the access with the extended Fig. 1. Example of shared storage in an IaaS cloud environment
S3 RESTful interface. Implicit and static performance setting is
also possible for the unmodified S3 interface. PapioS3, an object request of the applications. In fact, Amazon provides another
store which supports both of these S3 interfaces, is developed storage service called EBS [9], which can deliver dedicated
for implementing the approach, along with achievement of high and stable performance to each application, on a conventional
performance upload/download using the multipart data transfer. block storage interface. But EBS is not reasonable to store
The evaluation confirms the performance of PapioS3, and its QoS
capability at the S3 data transfer, in several situations where
infrequently accessed, large data because the price for volume
multiple S3 clients concurrently access the same PapioS3 system. and I/O is more expensive than S3. It would be useful if EC2
In addition, the QoS effect is compared with a load balancing users were able to store large data in S3 for a long period
approach in an existing object store, as part of the experiment. of time, with low cost, and only when the users request with
more payment, the users can then access the data with a certain
I. I NTRODUCTION level of performance. Also, apart from the financial cost of
commercial clouds, if people want to provide a performance
In an IaaS (Infrastructure as a Service) cloud environment assured cloud service, the QoS of the S3-based object store
where virtual machines are deployed, a scalable and reliable would be an important piece.
storage service is indispensable for hosting virtual machine im-
ages, backups, and large volumes of application data. Although This paper presents an approach which allows users to
traditional NFS or SAN-based storage systems are still used for have performance guarantees in the use of an S3-based object
this kind of service, scale-out object stores are being adopted store, by making a performance reservation in advance. The
because they respond to a demand for the scalability in terms reservation is intended to be used as an explicit request to
of performance, capacity and management, etc. In Amazon the object store from the users. At the same time, an implicit
EC2 (Elastic Compute Cloud) [1], Amazon S3 (Simple Storage reservation is allowed so that each application will have at
Service) [2] takes on the role of such an object storage least a certain level of performance. To provide this feature, a
service and the S3 interface allows users to upload, download, storage system which supports performance reservation [10],
and manage data as objects. Amazon does not disclose the Papio, is used at the S3 backend. Then, for the explicit
internal architecture of S3 but some storage systems [3,4] reservation, I/O operations of the S3 REST API are extended
provide an S3-compatible interface. The S3 interface is used by to conform to the Papio interface. The reason why we focus
various applications and systems, including cloud management on S3 is because various S3 clients are available, as such, it
platforms such as OpenStack [5] and CloudStack [6]. is like a de-facto standard. This paper describes our design
of a performance-guaranteed or QoS-enabled, S3-based object
However, there is a concern for performance instability store and the extension of the S3 interface, and then shows
when many applications concurrently and heavily access the a prototype implementation of it. The prototype has been
same S3-compatible object store, due to the characteristics evaluated to examine the QoS capability at the S3 data transfer,
of shared resources, as shown in Figure 1. When storage through a comparison with a load balancing approach by
resources are shared, available I/O latency and throughput vary using a novel designed object storage, RADOS [11], in several
for users, depending on time and allocated servers [7,8]. In the situations where multiple S3 clients access the same resources.
object store, proper replication and load balancing mitigate
the problem, but they might not respond to every individual In summary, this work makes the following contributions:

978-1-4799-2784-5/14 $31.00 © 2014 IEEE 784


DOI 10.1109/CCGrid.2014.76
• We introduce the concept of a QoS-enabled function C. Design of PapioS3
mapped to performance reservations into an S3-based
object store in order to solve the shared use problem Our QoS-enabled, S3-based object store is called PapioS3.
on the IaaS cloud environment. We designed it so that Papio provides QoS for I/O throughput
at the S3 backend and PapioS3 exposes the QoS function of
• We present the design of a QoS-enabled object store Papio, which allows users to have their desired I/O throughput.
by using reservation-based, performance guaranteed However, there are no means for users to request the per-
storage at the S3 backend. While some S3 operations formance demand with the S3 interface. Therefore we have
are extended for an explicit user request, an implicit decided to support the QoS function in two ways, with both
request in a normal operation is also implemented. explicit and implicit methods.
• We evaluate our approach by examining the QoS For the explicit method, users can request the desired
capability in shared use situations and comparing the performance to the backend Papio, by using Papio’s CLI or
benefit with a load balancing approach in one of the another Web-services interface based on GNS-WSI3 [14]. The
existing object stores, RADOS. request tools allow users to make reservations in advance of
their actual accesses, as described in the previous subsection,
II. S YSTEM D ESIGN and the users receive a reservation ID. Then the users can
Our proposed QoS-enabled, S3-based object store is based access PapioS3 with the extended S3 interface. Since the
on the interface of Amazon S3 and the performance guarantees reservation ID can be set in the extended interface, Papio
function of the Papio storage system. In this section, we briefly can reference the reservation information using the ID and
introduce both of them and present our system design with an control I/O throughput for the access. In PapioS3, the extended
S3 extension. interface is implemented in the REST API. The details of the
extension are explained in the next subsection.
A. Amazon S3
For the implicit method, the performance request is set
Amazon S3 (Simple Storage Service) is an online storage statically. Then, when users access PapioS3, those users can
service for the Internet, provided by Amazon [2]. The service always have the pre-set performance. This method is less
for storing and retrieving data is available anywhere on the flexible than the explicit one. If resources are not available
Web, at any time. The scale continues to grow and 2 trillion to meet the pre-set requirement, the access will be an error
objects have been stored as of April 2013. When using Amazon because the implicit performance reservation is made just
EC2, S3 can be used for managing virtual machine images, before the access. However, this design allows users to use the
templates, and backups. original S3 interface with the I/O control function of Papio.
Furthermore, both methods can be enabled at the same time so
Although the internal architecture of S3 is not disclosed,
that the users choose the explicit method only when necessary,
the REST and SOAP interfaces for S3 are open. Thus various
and use the implicit one all other times.
S3 client tools and S3-compatible storage systems are avail-
able. The S3 users normally create a bucket and then store data We also considered the performance of the S3-based data
as an object in the bucket. Upload and download of the object transfer. In our initial experiment, the S3 data transfer in a
are performed via HTTP or BitTorrent. The maximum object single stream of ‘PUT Object’ and ‘GET Object’ was much
size is 5TB, and for large objects, multipart data transfer is slower than direct access to the Papio storage, due to the
available. The details can be found in the document [12] on overhead of S3 over HTTP. The slowness would make the
Amazon’s web site. high-performance and QoS capabilities of Papio less effective.
In order to maximize the effect of our approach in PapioS3,
B. Papio Storage System we worked on multipart data transfer for S3, in which a
Papio is an object-store type storage system that supports single, typically large, object is separated into multiple parts
parallel I/O and performance reservation functionality [10]. for upload or download. The benefits are, for example, that
The ‘bucket/object’ semantics of Papio resemble those of S3. the parts can be transferred in parallel for performance, and
However, when a bucket is created in the Papio storage, that the amount of data transfer is reduced by preventing
an amount of disk space equal to that the user requested retransfer of previously transferred and unchanged parts. Our
is reserved for the bucket. Then the user can reserve I/O implementation of the multipart data transfer focuses on the
performance with the desired throughput (e.g., MB/sec), access former and thus supports parallel streams, the details of which
type (write or read), and access time (from start to end), for are presented in Section III-B.
the bucket to write, or for an object in the bucket to be read.
The reservation ID is issued by Papio when the reservation D. Extension of the S3 REST API
is accepted. The user’s program as a Papio client can access
the bucket or the object with the reservation ID. Internally The extension of the S3 REST API1 for PapioS3 is mini-
Papio allocates storage resources according to the reservation, mized as much as possible in our design. No new operations
and controls the I/O throughput of the storage devices and are added and all added parameters are defined as non-
the storage network to provide the requested throughput to the mandatory options. If any option for Papio is not set in the
Papio client. In the latest version of Papio, throughput of the request, the operation will be regarded as the standard S3
storage network is controlled by PSPacer [13] and disk I/O operation.
throughput is managed by Papio’s own scheduler tuned for
Solid State Drives (SSD). 1 The extension is based on Amazon S3 API Reference Version 2006-03-01.

785
TABLE I
A LIST OF THE EXTENDED S3 OPERATIONS FOR THE RESERVATION

Operation Description
PUT Bucket Create a new bucket tied to the space reservation feature of Papio
GET Object Retrieve an object from the Papio storage system
PUT Object Put an object in a specified bucket in the Papio storage system
Initiate Multipart Upload Initiate a multipart upload to the Papio storage system

Fig. 3. X-Papio-Access-ID in the request header of the ‘GET Object’ example


Fig. 2. Optional parameters for ‘PUT Bucket’

Table I shows a list of the extended operations for PapioS3.


The ‘PUT Bucket’ operation of S3 for creating a new bucket,
is extended to accommodate space reservation, because each
bucket is tied to the reserved space in the backend Papio.
Optional parameters for specifying a space requirement: size,
lifetime, write and read throughputs at access time, etc., are
added in the request element of ‘PUT Bucket,’ as shown in
Figure 2. ‘PUT Object,’ ‘GET Object’ and ‘Initiate Multipart
Upload’ for writing or reading data are extended for the
reservation-based access. An optional parameter named ‘X-
Papio-Access-ID’ for setting a previously obtained reservation
ID is added in the request header of these S3 operations, as
Fig. 4. PapioS3 overview
in the example shown in Figure 3. When the extra parameters
are set in the client request, PapioS3 recognizes the request
as an explicit one and uses the space parameters or the ID by parallel streams. However, there was a problem that Papio
in intermediating the request to Papio storage. Besides, when controls the I/O performance of the single opened object
they are not set, PapioS3 automatically makes a reservation access. Note that Papio implements striping access but an
with Papio for the request, with a statically set requirement. object should be opened by a single process. Hence, multi-
parted S3 requests should be combined into a single opened
III. I MPLEMENTATION access to the Papio storage. In our implementation, the RGW
process launches an I/O process for each reservation-based
A. Overview access and the I/O process represents a set of multi-parted
We implemented PapioS3 and the corresponding S3 client requests, as shown in Figure 5. The details of the upload
which supports the extended S3 operations. An overview of and the download are explained separately in the next two
our implementation is shown in Figure 4. The PapioS3 server subsections.
is a frontend and it has been implemented based on RADOS On the client side, we implemented the multipart download
Gateway (RGW)2 [4], which runs as a CGI program and in parallel streams, which had not been supported in the
interacts with the Web server via FastCGI to intermediate original code of JetS3t we used. We also added a function
S3 operations to the backend storage. RGW is developed for specifying the size of the part and the number of parallel
for the object store called RADOS [11] but it implements a streams, in both the multipart upload and download 4 .
mechanism to translate and map the S3 operations to the API of
the backend storage. In our implementation, the API mapping 1) Upload Implementation in PapioS3 Server: When an
is switched from RADOS to Papio, without modifying the RGW process receives an initialization request for a multipart
translation layer. upload (Initiate Multipart Upload), it launches an I/O process
which represents writing data to the Papio storage. Then, the
The PapioS3 client was implemented with JetS3t [16], RGW process passes subsequent write requests (Upload Part)
which is an open source Java toolkit designed to support S3 to the I/O process in the proper order. In case previous parts
and other storage services. JetS3t 3 itself was slightly modified or data have not arrived and have not been written to Papio
to support the S3 extension and improved for the multipart data yet, the RGW process holds the request in a waiting list,
transfer. while keeping the data in a temporary storage space. After
all the previous data have been written, the request is passed
B. Multipart Data Transfer to the I/O process. Then the I/O process reads data from the
As described previously, our aim of supporting the multi- temporary space and writes it to Papio. In our implementation,
part data transfer in PapioS3 is to achieve high performance a RAM disk (e.g., /dev/shm) is used as the temporary space at
2 We used the RGW code that was included in Ceph Version 0.32 [15]. 4 In the original JetS3t, it is possible for users to specify the number of
3 We used JetS3t Version 0.8.1a. parallel streams for the multipart upload but it isn’t for the download.

786
C. User Authentication and Access Control
The original RGW provides a utility tool, radosgw-admin,
to manage users. The ‘user create’ operation triggered by using
radosgw-admin generates a pair consisting of an Access Key
and a Secret Key, which is needed for the authentication of
the S3 interface. The original RGW maps all the S3 users to
one RADOS account and the S3 users’ information is stored in
RADOS. In contrast, one S3 account is mapped to one Papio
account, in PapioS3. The S3 user information, including the
map, is stored in the local file system on the node where the
PapioS3 server runs. Since this is a prototype implementation,
a default performance requirement for the implicit reservation
is set in each PapioS3 server and shared by all clients accessing
the same server. In addition, PapioS3 supports only private
permission for buckets and objects, though it is still possible
Fig. 5. Implementation of the I/O process in PapioS3 to share an object by using the ‘signing the request’ feature of
S3. More flexible configurations and permissions would be a
future task in the development of PapioS3.
default. In case that previous data has already been written,
the data is passed directly to the I/O process and written
to Papio immediately. When the RGW process receives a D. Data Integrity
complete request (Complete Multipart Upload) or an abort
request (Abort Multipart Upload) for the multipart upload, the In the S3 interface, Content-MD5 is used for verifying the
RGW process terminates the I/O process. Automatic self-stop message integrity of PUT and GET operations. At a ‘PUT
at I/O errors is also implemented in the I/O process. Object’ operation, the S3 client calculates an MD5 digest of
the object, which is a base64 encoded 128-bit MD5 digest, and
sends it to the S3 server as a Content-MD5 value in the request
2) Download Implementation in PapioS3 Server: Unlike header. The S3 server compares the digest of the received
the multipart upload, no explicit operations for initiating and object to the Content-MD5 value for verification. Then the
completing the request are defined in the S3 interface for Content-MD5 value is stored as an ETag (Entity Tag) with the
the multipart download. A ‘GET Object’ request in which verified object. In the multipart data transfer, an MD5 digest
the Range parameter is set in the request header, is simply is calculated per chunk and the all digests are concatenated to
repeated by the S3 client. However, even when the Range be an ETag. At a ‘GET Object’ operation, the ETag is sent to
parameter is set, it does not mean that the request is for the S3 client so that the S3 client can verify the object.
a multipart download. Therefore, in our implementation, for
every read request, the RGW process checks if a corresponding In our implementation of PapioS3, a ‘PUT Object’ is
I/O process is running. Since the same identifier of X-Papio- performed as described above. However, the PapioS3 client
Access-ID is embedded in a series of requests, the RGW does not verify the object with the ETag, even though the
process can detect the series. Unless the I/O process has started ETag itself is sent to the client. We assume that the verification
for the series, the RGW process newly launches an I/O process should be made by application users, if they think it is
before passing the request to the I/O process. Otherwise the necessary.
RGW process just passes the request to a running I/O process
On the other hand, the ETag feature is also helpful for
which has been launched by one of the previous requests.
checking any updates of the object. ‘If-Match’ and ‘If-None-
Match’ can be set in the request header, and with the condition
In the I/O process, the received read requests are managed parameters, the S3 client can upload or download only updated
in a queue. The I/O process fetches one request from the queue, objects or chunks. This feature seems to be available in RGW’s
reads the requested data from the Papio storage and writes it translation layer but has not tested in PapioS3 yet.
to a temporary storage space (e.g., /dev/shm). As the write
operation is performed asynchronously, the I/O process can
continue to process the next request without any pause. When IV. E VALUATION
the RGW process is notified of the completion of the write
operation, the RGW process reads data from the temporary A. Overview of Experiments
space and sends it to the client. The RGW process deletes the We evaluated PapioS3 with respect to two aspects, perfor-
data in the temporary space at the end. mance and its QoS capability. For the performance evaluation,
we tested an effect of the multipart data transfer in parallel
Due to unawareness of the completion of the multipart streams and the overhead of PapioS3 against the performance
download, the I/O process might remain on the server side of direct access to the Papio storage. For the QoS evaluation,
after the PapioS3 client finishes the access. In order to solve we examined the QoS capability of PapioS3 and showed
this problem, automatic self-stop of the I/O process is imple- the benefits through a comparison between PapioS3 and the
mented. The I/O process checks the end time of the reservation original RGW backed by RADOS, which is referred to as
of the access and terminates itself when the reservation is over. RGW-orig in this paper.

787
TABLE II
E XPERIMENT ENVIRONMENT

Zones Roles in each system Machine Specifications


PapioS3 (w/ Papio) RGW-orig (w/ RADOS)
S3 PapioS3 clients PapioS3 clients Intel Xeon E5540 (2.53GHz, 4 cores) CPU×2, 48GB memory, CentOS 6.3.
PapioS3 servers The original RGW server
Backend storage Storage servers in OSDs in RADOS Intel Xeon E3-1230 (3.2GHz, 4 cores) CPU, 8GB memory, CentOS 6.2.
Papio OCZ Vertex3 (240GB) via SATA 2.0 was used for storing data.
Management server Monitor in RADOS AMD Opteron 6128 CPU (2GHz, 8 cores), 8GB memory, CentOS 6.2. OCZ Vertex2 (100GB)
in Papio via LSI-Logic MegaRAID SAS 9260-8i was used for storing metadata in Papio.

Table II shows the machine configurations and specifica-


tions we used for our all experiments shown in later subsec-
tions. All machines were connected to a 10 Gigabit Ethernet
switch in each zone, and the S3 zone and the backend storage
zone were connected by a single 10 Gigabit cable. We used
the same machines for similar roles in both systems, PapioS3
and RGW-orig. The version of RGW-orig was 0.72.2 and
most of its parameters were set as defaults. In the RADOS
system, pools were created with 1024 PGs and no replication
setting, in order to compare it with the Papio storage which
does not support automatic replication. The stripe size was set
to 4 MB in RGW-orig while it was set to 1 MB in Papio.
As a Web server for PapioS3, Apache (version 2.2.15) and
Fig. 6. Upload performance of PapioS3
mod fcgid (version 2.3.9) were used. Since the chunk size of
the multipart data transfer was set to 32 MB in the PapioS3
clients, the I/O size was set to 32 MB in PapioS3’s RGW and
I/O processes. In the RGW-orig, optimized versions of Apache
(version 2.2.22) and mod fastcgi (version 2.4.7) for the 100-
continue [17] support were used, and the RGW process was
executed as an external server.
We implemented a benchmark with the PapioS3 client
library, which uploads or downloads an object of 1 GB size.
Then we ran the benchmark in various situations. In the figures
shown below, all results are an average of 5 trials.

B. Performance of PapioS3
Fig. 7. Download performance of PapioS3
We evaluated performance of the S3-based data transfer of
PapioS3, when the benchmark was executed without interfer-
ence by other I/O workloads. Figure 6 shows the upload per- for 400MB/s. In this result, the multipart download could
formance of PapioS3 when 180MB/s or 360MB/s throughput increase the throughput by parallel streams and achieve more
was reserved for the write access to the backend Papio storage. than 98% of the performance that the Papio storage could
Inside of the Papio storage, one storage server was allocated for provide.
achieving 180MB/s throughput and two storage servers were These results indicate that the multipart data transfer is
allocated for 360MB/s. In Figure 6, Standard shows the result essential to achieve high throughput in the S3 interface when
of the normal, non-multiparted upload and N threads show the the throughput is provided by the backend storage system.
result of the multipart upload in N parallel streams. Then Papio
shows the result of direct access to the Papio storage. This
result indicates that the multipart upload can hide the overhead C. QoS Capability of PapioS3 in Concurrent Accesses
of the S3-based operation and achieve more than 80% of the
performance that the Papio storage provides, while Standard 1) QoS in PapioS3: First, we examined a basic QoS
operation does not make enough use of the performance. Since capability of PapioS3 in a simple configuration. The Papio
we confirmed that the Papio storage itself achieved the reserved storage which serves at the backend of the PapioS3, had
rate in the all cases, the reduced performance against Papio, in only one storage server, and thus the Papio had to control
particular for the 360MB/s request, would be caused by some I/O throughput rates on the storage server to satisfy each
internal overheads of the PapioS3 server. performance demand of concurrent clients. Figure 8 and 9
show the measured performance of 5 concurrent accesses for
Similarly, Figure 7 shows the download performance of upload and download respectively. The 5 clients (A∼E) were
PapioS3 when 200MB/s or 400MB/s throughput was reserved launched on a different node and their throughput requests
for the read access to the Papio storage. Inside of the Pa- were set in the ratio of 5:4:3:2:1. The aggregated throughput
pio storage, one storage server was allocated for achieving was set to 180MB/s for the upload test, and 200MB/s for the
200MB/s throughput and two storage servers were allocated download.

788
Fig. 8. QoS test in 5 concurrent uploads

Fig. 9. QoS test in 5 concurrent downloads

The results indicate that available throughput of the storage


server was shared properly and every client achieved the closed
rate to the requested one. The achieved rates of Client A
and B for the upload slightly lowered than the corresponding
requested rates, due to the overhead in the PapioS3 server.
However, the falling rates are less than 1%.
2) Comparison between PapioS3 and the original RGW:
Second, we examined our integrated QoS functionality, by a
comparison with a simple load balancing approach in RGW-
orig. Our purpose of this comparison is not to just contrast
measured performance but to clarify both benefits and short-
comings of the QoS functionality, based on experiment results.
Therefore, we measured performance of PapioS3 with QoS
setting while using the maximum level of the available total Fig. 10. Comparison between PapioS3 and the original RGW
throughput, which was equivalent to the total that the RGW-
orig system could provide. This means we first conducted
RGW-orig distributes the I/O loads over OSDs, the clients
the experiment of RGW-orig and then did the experiment of
were equally affected by each other and each had similar
PapioS3 in accordance with the RGW-orig result.
throughput. The total aggregated throughput was 71% for
Setup - The experiments were performed on a larger the upload, or 84% for the download, against the maximum
environment, where each backend storage system consisted theoretical throughput of the 10 Gigabit Ethernet. For PapioS3,
of 8 storage servers or OSDs. In the benchmark execution, we targeted the total aggregation rate to be a slightly higher
12 clients of PapioS3 were launched to concurrently access than the rate that RGW-orig achieved, with the unbalanced
the same object store, PapioS3 or RGW-orig, and each client performance request. In a), 30% of the total throughput was
performed upload or download with 3 parallel streams. As 3 assigned to 2 clients and each of the clients occupied a single
PapioS3 servers were launched in front of the Papio storage, PapioS3 server. In b), 50% of the total throughput was assigned
the accessed server for each client was manually arranged. to 4 clients. The 4 clients were separated to 2 groups and each
Besides, the all clients accessed the same RGW-orig server group occupied a single PapioS3 server.
which was launched in front of the RADOS system.
In a), PapioS3 successfully provided the target rate for
Results - Figure 10 a∼c) depict I/O throughput of each the first 2 clients but other clients had lower throughput. The
client which performed the upload or download operation. In total throughput dropped at 16% from the request. In our
the horizontal axis, CN means the N-th client and the subse- investigation, a main reason for the decrease would be some
quent -U or -D represents upload or download respectively. In internal overheads of the PapioS3 server. In consequence, each
Figure 10, a) shows a case that the all clients performed the PapioS3 server has certain limitation to handle the number of
upload and likewise, b) shows a download-only case. Since clients, depending on their requested performance. In b), the

789
overheads were low enough and PapioS3 effectively provided VI. C ONCLUSION AND F UTURE W ORK
each targeted rate without decrease of the total throughput.
This paper presents a design of the QoS-enabled function in
c) shows a mixed case of concurrent 2 uploads and 10 the S3-based object store, with keeping the S3 compatibility
downloads. While each performance of the uploads was much as much as possible. To respond to demands of individual
affected by other I/O workloads in RGW-orig, the 2 clients applications, an explicit performance request is allowed with
provided a high requested rate in PapioS3. The total throughput a minimum extension of the S3 RESTful interface. An im-
was targeted to be a much higher rate in PapioS3 than in RGW- plicit performance request is also supported, which allows
orig, and it was well achieved. administrators to set a quota-style QoS, in particular when
the S3 interface cannot be changed at all. This QoS feature
Discussion - The advantage of PapioS3 is obviously to is implemented based on performance guarantees provided by
provide the requested throughput to each client. This is more the Papio storage system running at the backend of the S3
beneficial when the upload and download workloads are server. Through our evaluation, it has been proved that the
mixed. However, a few limitations of the current PapioS3 have QoS capability can respond to the performance demand of
also been uncovered through the experiments. We saw a case each client, which is not simply possible with a load balancing
that some clients failed because the request patterns did not approach of the existing object store. The result is shown
fit in an availability situation of storage servers. The case did with the speed-up by our implementation of the multipart
not appear in RGW-orig because all clients continued access data transfer, which achieves high throughput that the backend
even when the performance was low. The problem might be storage provides.
solved by an automatic negotiation mode. The negotiation
On the other hand, since this study is an initial step to
allows a client to use the maximum available throughput at
provide our designed QoS feature in the S3-based object store,
the reservation request, and the request with the mode would
there are several remaining issues which would be a future
rarely fail by a resource shortage. In particular, implementation
work, as follows:
of the mode in the PapioS3 server is essential for the implicit
reservation. Another problem is necessity of QoS inside of • Our S3 extension does not include the reserve op-
the PapioS3 server. Currently, a separation of the accessed eration itself. A standardized effort for the S3 or
PapioS3 servers for clients would be the only solution when CDMI [23] like interface to request the QoS level
the problem appears seriously, but it would be fully solved by against the object store would be important. A co-
an implementation effort. operation with the immediate access but which does
not require performance guarantees would be needed,
too.
V. R ELATED W ORK • Since the S3 frontend server works on a fair-share
manner, achieving target rates would be difficult in
Automatic storage reconfiguration and storage tiering based high usage though the backend storage provides an
on a dynamic workload analysis are studied for I/O optimiza- end-to-end QoS. Thus the frontend server should also
tion, including QoS support [18,19]. In these studies, QoS is work on a QoS based manner.
mostly controlled on a volume basis, which is assigned to each
application or user group. In the cloud service, the volume • In PapioS3, a performance reservation is automatically
is assigned to one tenant which consists of a set of virtual mapped to a proper stripe pattern of Papio but the
machines. In Pisces [20], tenant-based performance isolation mapping cannot be changed. Automatic conversion to
and fairness of the shared key-value store are studied. Pisces another stripe pattern corresponding to a new reserva-
supports a weighted allocation of performance to the tenants. tion for the same object would be useful.
QoS controls based on the flash storage array is also available
in commercial products like SolidFire [21], and applied to the • PapioS3 should support more operations and functions
cloud service. of the S3 interface and be improved through further
evaluations with practical applications, so that it be
In contrast, PapioS3 aims at providing a finer-grained, per- used on the production environment.
access QoS function, with the assumption that each access
takes longer time because the accessed object is large enough. ACKNOWLEDGMENT
PapioS3 does not focus on the volume basis, over the block
storage device or file system interface. Our study targets on the This work was partly supported by KAKENHI (23680004).
object store interface and the Amazon S3 interface has been
chosen in PapioS3. Additionally, a performance measurement R EFERENCES
in PapioS3 is referred to throughput (MB/s) whereas many
systems use IOPS. [1] “Amazon EC2,” http://aws.amazon.com/ec2/.
[2] “Amazon S3,” http://aws.amazon.com/s3/.
In our previous works [10,22], any standard access inter- [3] “Swift,” http://swift.openstack.org/.
face is not considered in Papio and the only MPI-IO based [4] “RADOS Gateway,” http://ceph.com/docs/master/radosgw/.
interface is studied for the HPC applications. This work is our [5] “OpenStack,” http://www.openstack.org/.
first attempt to apply the QoS function of Papio to the clouds, [6] “Apache CloudStack,” http://cloudstack.apache.org/.
including a support of the widely used interface in the cloud [7] S. L. Garfinkel, “An Evaluation of Amazon’s Grid Computing Services:
service on the top of Papio. EC2, S3 and SQS,” Harvard University, Tech. Rep. TR-08-07, 2007.

790
[8] A. Iosup, N. Yigitbasi, and D. Epema, “On the Performance Variability
of Production Cloud Services,” in Proceedings of 11th IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing, 2011,
pp. 104–113.
[9] “Amazon EBS,” http://aws.amazon.com/ebs/.
[10] Y. Tanimura, H. Koie, T. Kudoh, I. Kojima, and Y. Tanaka, “A
Distributed Storage System Allowing Application Users to Reserve I/O
Performance in Advance for Achieving SLA,” in Proceedings of the
11th ACM/IEEE International Conference on Grid Computing, 2010,
pp. 193–200.
[11] S. A. Weil, A. W. Leung, S. A. Brandt, and C. Maltzahn, “RAODS: A
Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters,”
in Proceedings of the 2nd International Workshop on Petascale Data
Storage, 2007, pp. 35–44.
[12] “Amazon S3 API Reference,”
http://awsdocs.s3.amazonaws.com/S3/latest/s3-ug.pdf.
[13] R. Takano, T. Kudoh, Y. Kodama, M. Matsuda, H. Tezuka, and
Y. Ishikawa, “Design and Evaluation of Precise Software Pacing Mech-
anism for Fast Long-Distance Networks,” in Proceedings of the 3rd
International Workshop for Fast Long Distance Networks, 2005.
[14] “GNS-WSI version 3,” http://www.g-lambda.net/.
[15] “Ceph,” http://ceph.com.
[16] “JetS3t,” http://jets3t.s3.amazonaws.com/index.html.
[17] “RFC 2616, Section 8,” http://www.w3.org/Protocols/rfc2616/rfc2616-
sec8.html.
[18] E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and
A. Veitch, “Hippodrome: running circles around storage administration,”
in Proceedings of the 1st USENIX Conference on File and Storage
Technologies, 2002.
[19] A. Elnably, H. Wang, A. Gulati, and P. Varman, “Efficient QoS for
Multi-Tiered Storage Systems,” in Proceedings of the 4th USENIX
Workshop on Hot Topics in Storage and File Systems (HotStorage’12),
2012, pp. 1–5.
[20] D. Shue, M. J. Freedman, and A. Shaikh, “Performance Isolation and
Fairness for Multi-Tenant Cloud Storage,” in Proceedings of the 10th
USENIX conference on Operating Systems Design and Implementation,
2012, pp. 349–362.
[21] “SolidFire,” http://www.solidfire.com/.
[22] Y. Tanimura, R. Filgueira, I. Kojima, and M. Atkinson, “MPI Collective
I/O based on Advanced Reservations to Obtain Performance Guarantees
from Shared Storage Systems,” in Proceedings of the 5th Workshop
on Interfaces and Architectures for Scientific Data Storage, held in
conjunction with IEEE Cluster 2013, 2013, pp. 1–5.
[23] “CDMI (Cloud Data Management Interface),”
http://www.snia.org/cdmi.

791

You might also like