REST at Amazon v1
REST at Amazon v1
1 Abstract
RESTful Resource-oriented distributed system interactions are built around interactions offer improvements over
traditional RPC-based service interaction s. It does this by focusing on the data artifacts in the system, or
“resources”, rather than the functional behaviors. Open-content systems presume that all data in those resources
is significant to someone and therefore must be preserved even in the event that only a subset of the content is
necessary for the current interaction. By sharing adefining common set of operations over resources, the
functional semantics of the interfaces can be fixed. By combining those common operations on resources with an
open-content-based resource model, while the semantic expressiveness of the interfaces can be extended as
necessary. This applies to traditional data oriented resources and also to workflow operations.
This document specifies a complete model for open-content resource-oriented service definition using HTTP, SDL
and ION. Interfaces defined with this framework can be automatically externalized as XML/XSD over HTTP, or
JSON over HTTP. Hypertext Transfer Protocol (HTTP), used properly, offers a convenient and widely-
adopted application protocol for implementing RESTful interactions. By defining standard conventions for proper
use of HTTP and combining those with a standard extensible datagram format (ION) and interface definition
language (SDL), we define a complete model for extensible RESTful service definition.
2 Contents
1 Abstract.............................................................................................................................................................................................. 1
3 What do you mean by REST? ........................................................................................................................................................... 2
3.1 Why HTTP? ......................................................................................................................................................................... 333
3.2 General Principals .............................................................................................................................................................. 433
4 HTTP Client assumptions .............................................................................................................................................................. 544
4.1 HTTP semantics .................................................................................................................................................................. 544
4.2 Methods.............................................................................................................................................................................. 544
4.3 Redirects ............................................................................................................................................................................. 544
4.4 Connections ........................................................................................................................................................................ 554
4.5 Retry .................................................................................................................................................................................... 554
4.6 Security ............................................................................................................................................................................... 554
4.7 Encodings............................................................................................................................................................................ 655
The key feature principal behind resource-oriented interfaces is to consider the data artifacts, or resources, that
best represent the service-to-service state transition semantics.
In traditional RPC-based interface design, interfaces model the functional behaviors the service is trying to enable.
When one service invokes another service, it does this by invoking a command and a set of parameters. For
instance, a service which processes orders might have an interface that says “sendOrderForProcessing” with an
order-id as a parameter.
In resource-oriented interface design, interfaces are modeled based on the resources needed to define the
semantics of those behaviors. Instead of one service expressing a functional directive to another service with a
command and a set of parameters, resource-oriented interfaces focus on transferring state that represents the
change in the state of the world that one service wishes to communicate to another service (representational state
transfer is also known as REST). For the same example above, instead of expressing the command
“sendOrderForProcessing” a resource-oriented interface would allow you to express the change-in-state desired
by either updating a resource representing the state of the order, or adding the order resource to a queue
resource which representing the orders-that-need to be processed.
While this document focuses primarily on resource-oriented interactions, care and attention have been paid to
how resource-oriented service interactions can live along-side traditional RPC-style interfaces. is to expose your
service’s functionality as a set of resources. These are self contained data entities that representations of which
the service and its users exchange. In so far as the service allows, these entities are manipulated with the standard
CRUD operations – create, read, update and delete. When using HTTP as the communication protocol these
operations are mapped for onto the standard HTTP methods – PUT, GET, POST and DELETE.
As a practical matter HTTP is the most direct means of implementing this paradigm. The methods and the basic
semantics of the HTTP application protocol are the semantics of REST. (note: just because a service exposes it’s
functionality through HTTP does not mean it has a REST interface or is even particularly resource oriented.) In
addition the resources in a SOA environment are data. As such the content needs to support data serialization
well. For the data serialization JSON or Ion is the preferred format, JSON externally facing and Ion internally.
One key departure from some REST definitions is that this incarnation is not a “pure REST” approach. The bulk of
your service should be exposed as resources manipulated using the standard HTTP methods. In fact iIn most cases
the entire service functionality will be exposed as resources. However there are times when it is necessary to
expose functionality in a way that is really a procedure call. When it is, essentially, obfuscating to transform the
procedure invocation as a resource change then exposing the procedure directly is supportedTo accommodate
these situations, a syntaxmechanism for remote procedure calls is also defined by this specification .specification.
Because the application-layer semantics of HTTP have been overloaded to support use-cases outside of resource-
oriented interactions, we have found it necessary to revisit the HTTP specification and provide clarifying guidance
around how to use HTTP to achieve robust resource-oriented interactions. (note: just because a service exposes
it’s functionality through HTTP does not mean it has a REST interface or is even particularly resource oriented.) In
addition the resources in a SOA environment are data. As such the content needs to support data serialization
well. For the data serialization JSON or Ion is the preferred format, JSON externally facing and Ion internally.
HTTP as an application protocol is well tested, reasonably performant (when used appropriately), flexible, Formatted: No bullets or numbering
and well supported. (note HTTP is *not* a transport, if you're looking for a transport check out TCP or SMTP)
HTTP’s semantics align well with the tenants of both supportable distributed systems and resource
oriented services.
Resources may be real or virtual. Simple CRUD services, like a simple rolodex service, may only need real
resources. Many services, like the Amazon Item Pipeline, expose virtual resources that map to underlying
resources indirectly.
While HTTP supports persistent connections this is strictly a performance optimization. HTTP is a stateless
protocol. That is neither the server nor the client require context beyond the single request for requests to be
valid and processed. This is an important characteristic to facilitate scaling, fail over, and should be maintained.
A commonly encountered example of state that has created problems for distributed systems is locking. Any form
of locking that spans requests - such as transactions - opens up this issue. The challenge is when you can remove
such a lock and what the state of the client might be if you do so. Techniques, such as eventual consistency or
optimistic concurrency or workflows, can be used to avoid this pitfall.
Use them.
Try to form the service as resource updates. Many operations can be framed as "store user input". The store then
triggers other work.
The user can come back for status on how the operation is proceeding including errors as appropriate.
However not all operations make sense as a resource. One simple example that comes to mind is "shutdown"
(turn off the server or service). While this could be implemented as some resource or resource property which
could be changed to trigger the shutdown it is really best suited to be an RPC. So use an RPC when a resource
The operations using the methods GET, PUT and DELETE are defined in HTTP to be idempotent – i.e. free from side
effects, or at least repeatable without problems and with the same final result as if the operation had been done Commented [ay1]: idempotent really does not have guarantee
only once. This property needs to be preserved. Again this facilitates distributed applications, especially retry about side-effect. PUT and DELETE are full of side effects.
since in many situations (especially in the face of network errors) the client is unable to tell if an operation they
started completed.
4 HTTP Client assumptions Commented [mwh2]: A strange and arbitrary subset of HTTP
functionality. Include: representations, checksum, compression,
conditional execution, tentative execution, pipelining, streaming,
4.1 HTTP semantics authorization, caching here? [taken from my HTTP summary doc.]
Clients must respect the HTTP semantics. This includes policies around caching and TTL's. This includes the
idempotence of the requests when appropriate (i.e.e.g. GET). And it should include use of HTTP headers when the
HTTP headers currently defined provide the functionality the service or client use. Identity, caching, encoding are
all examples where the functionality should be handled through existing HTTP headers.
4.2 Methods
The client code should be able to use the standard HTTP methods - including GET, PUT, DELETE, POST, and HEAD.
It should also be able to pass through extended methods. <(with the details of how covered elsewhere>)
4.3 Redirects
The client library (in whatever form is appropriate to the hosting language) should accept and handle redirects
from the service being contacted. We will likely want policies around whether further requests should "stick" to
the new base URL or revert back to the original base (there are use cases for both). HTTP supports both
alternatives.
4.4 Connections
We should have support for persistent connections as the normal interaction is one where multiple requests will
typically be used between the client and the service.
4.5 Retry
Clients are expected to retry appropriately on failed (timed out) requests. They are also expected to be well
behaved when retrying. "Well behaved" includes appropriate backoff schemes and termination of retry after a
reasonable period including support for the “Retry-After” header.
4.6 Security
The HTTP client should handle some aspects of security. This includes support for HTTPS for use in some
circumstances. Support for the client library (in the whole) should include support for validating security
certificates appropriate with Amazon policies. The client support should provide support for signing and or
4.7 Encodings
All of our clients should use UTF-8 and as the preferred character encoding. Exceptions to this should be limited to
services or clients that have to work with external services (or clients) and where UTF-8 encoding is not available.
(and fFor non-textual data, such as PNG resources, clearly UTF-8 isn't really meaningful for PNG resources) The
client should use the content type header to request the serialization in our internal formats - such as Ion binary or
text, BSF datagram, JSON, etc. In general XML is not appropriate internally as a data serialization format due to its
size of decoding complexity (if the data is XML then that's another matter - for example the Merchant data in
single feed format). When communicating with external parties XML certainly needs to be supported.
5 Resources
Resources are data that a service makes available to its users. The data may be encoded in any of a number of
data formats, but in general the service data in our Amazon services will be encoded in a way that the caller can
understand and operate onf it (as distinct from an undifferentiated blob). Biblio records, Tibco datagrams, XML,
JSON, Stumpy and Ion are all encoding techniques we use today.
These definitions are very generic and open-ended. To better describe common HTTP usage patterns, this
document classifies resources into the following categories:
Workflow Engine A special manager for resources that represent autonomous processes
[mwh: these are somewhat different than in Alex's glossary.] Commented [mwh4]: TODO
Commented [ay5]: TODO: to add “query” and “system”
5.2 References as URL's resources here, to clarify workflow related concepts here, to make
the terms here and the terms in Glossary in sync.
While a resource is data often a resource contains data that serves as a key to access other resources. And while
from an information- theoretic point of view the raw key values, such as an integer customer id, has essentially the
same semantic value as a URL used to fetch the customer using the id, the URL form is much easier to use in
practice. As a result when you are including references to entities that are themselves resources it is
recommended that you use the URL form when that is practical. And when the "raw" value is required it may even
be useful to include a redundant copy in the form of a URL. Commented [mwh6]: Is this a good place to bring in ACI ?
"System" Resources, the root/: Formatted: Font: Cambria, 12 pt, Bold, Font color: Custom
Color(RGB(79,129,189))
This is the unidentified resource, i.e. no path. This should return a simple human oriented page that could be used
Formatted: Font: 12 pt
at the starting point for ad hoc (i.e. developer) exploration of your service, offering links to documentation if that's
appropriate. Commented [ay15]: Do we want to deal with multiple services
hosted at one host? If so, all paths are will be prefixed with
"/serviceName".
Or this may just fail. A developer friendly page here is recommended.
"system" resources, status/status: Formatted: Font: Cambria, 12 pt, Bold, Font color: Custom
Color(RGB(79,129,189))
This is essentially the "ping" resource. As a minimum it will show that the service is operational.
Formatted: Font: 12 pt
"system" resources, schema/interface: Formatted: Font: Cambria, 12 pt, Bold, Font color: Custom
Color(RGB(79,129,189))
Formatted: Font: 12 pt
This is the base for "type" discovery. The schema for all the entities this service supports should be accessible
through this base. Generally the name of the user resource is a key that can be used to access the schema for that
resource. [cas: we might want an addition "sub directory" here like schema/resources or service Commented [mwh16]: Let’s see how that works out… SDL is
definition/schema] The schema version should be an optional key. oriented around a single rooted document, with the service owner
controlling layout (so at the very least it should be an SDL
convention, not a resource naming / framework convention).
/ping: Formatted: Font: Cambria, 12 pt, Bold, Font color: Custom
A simple liveness test that returns no content and a “200 OK” status when the service is functioning normally. Color(RGB(79,129,189))
Formatted: Font: 12 pt
In addition there should be information about the service itself accessible through this.
Commented [ay17]: What is the difference between “/ping”
and “/status”? Suggested to remove these 2 sentences.
For Ion binary the schemata should include the symbol tables used for serializing the content to facilitate sharing
symbol tables across calls. See general principals - don't share state :) Commented [ay18]: TODO: Clean this up
PUTting these resources might be able to be done with appropriate access control. This is left as an exercise to the
implementer. Commented [ay19]: Suggest to change the tone to: Access
control implementation is service implementation specific.
It should be possible to register for changes in some fashion. Either through versioning of the overall service (is
that the schema schema, or a value associated with 'status'?). Or a publishing stream. Or a poll-able interface. Commented [ay20]: TODO: clean this up
5.5 Versioning
Versioning is a feature that exists at (at least) 3 levels. The service API itself should support versioning so when the
semantic of the service changes sufficiently to effect the users a distinct version can be accessed. [cas: my current Commented [mwh21]: Of course SDL supports a multi-level
versioning schema, intending that you do it at both levels to most
take on service version and API versioning is this should be done at the service level, as opposed to the API level.] accurately capture what is changing and how.
Commented [ay22]: Moved the comment/question to here:
[mparthas: But doing it at the Service level is very coarse grained, isn’t it? What if only one API changes? Would it [mparthas: But doing it at the Service level is very coarse grained,
not be simpler for the clients if only that API has a different version and the others remain unchanged? ] isn’t it? What if only one API changes? Would it not be simpler for
the clients if only that API has a different version and the others
The definition of resources a service managed also change over time. This change is handled by schema remain unchanged? ]
versioning. Most service changes can be framed as schema version. Services should support older versions using
Commented [ay23]: We need to describe when one should use
Service-Level versioning and when one should use Schema-Level
versioning.
Finally individual resource instances often require explicit versions. The instance version can be used to make
many operation idempotent, operations that would not otherwise be idempotent. In addition the instance version
can be used to enable optimistic concurrency - a non-locking concurrency model. In additionMoreover, under this Commented [ay25]: Just to avoid repetition of “in addition”.
model a service owner has the option of keeping a sequence of immutable instances to show how a resource has
changed over its lifetime. A tombstone can then be used to indicate that a particular resource has been deleted.
Versions should be monotonically increasing (i.e. always getting bigger). Schema versioning is defined as part of
the SDL work. Instance versioning is well handled by a reasonably large integer, which these days would be 32 or
64 bit.
5.6 Concurrent Writes (PUT’s) & Resource Versioning Commented [ay26]: Is this section intentionally empty?
5.7 Keys Vs URLs Commented [mwh27]: I think this whole thing needs to move
upwards so that naming, URIs, URLs, ACIs, and keys are all treated
Resource entities can be identified by location or by name. Locations and names are separate concepts. A name is comprehensively in one spot.
location-independent. An entity with the same name might exist at multiple locations, or its location may change,
but its name is global.
Resource locations should be specified as URLs. Commented [mwh28]: We need some language about
“authorities” here.
Resource names should be specified using the Amazon Common Identifier. Resource locations may embed
resource names. For instance, http://example.amazon.com/browse-node/amzn1.bn.1.289913 is an example of a
location (URL) that embeds a name (Amazon Common Identifier).
This should onlymay be used for query also for limited cases. A common one is to partially specify the key
returning multiple resources. Another is to add query parameters to filter the results. [cas: needs detail]
The key may include the version of the resource, if the services resources are versioned.
version plus one. This preserves cachability. [cas: this needs to be verified.] Commented [ay32]: How about the case of resources without
a version ID (e.g. most recent version)
The resource being PUT should be an exact image of the resource you would expect to GET using the same key. Commented [ay33]: Should or must? I can foresee a case that
This means PUT does *not* supports NEITHER partial update, nor NOR resources lacking their PK. extra data is added to the “PUT” result. (e.g. information retrieved
by other business logic; to resolve a URL, which points to another
resource, without version ID to become a URL with a version ID)
Not all resources can be PUT. PUT should generally be under tighter control than GET. For example, a read-only
query-based resource does not support PUT method.
There is no common contract on the time lag between PUT-ting a resource and when that version of the resource
will be returned on the corresponding GET. Note that when the GET includes the new version of the resource this
can be used to trigger "extra effort" on the part of the service to find that version (with the expectation that it is
"in progress", or perhaps just on another host). This trigger however must be examined with an eye towards DOS Commented [ay34]: Clarify what “examined with an eye”
risk and general cost. means?
The litmus test of whether something can be PUT is - will you GET what you PUT? If you can't GET what you PUT,
use POST. If GET gets you what you PUT, then PUT is the right choice.
DELETE
Mostly like PUT, but it removes the resource. Many services may wish to create a tombstone as the new copy of
the resource.
DELETE requires the full key, and the version for concurrency control when appropriate. Commented [mwh35]: See the previous “or not”
POST
POST is used to create a new resource when the caller does not specify the name of the resource. POST is also
used for a variety of other non-CRUD purposes. POST is not defined to be idempotent. This does not mean that
POSTs are *not* idempotent, but it does mean that requests that are not idempotent must be channeled through
the POST method. This includes non standard methods that might not be idempotent. It also includes some forms
of "create". Any time the service must assign the key to a resource and, therefore, the user cannot provide the key
a priori, the resource creation must be done using POST. Partial updates are also an example of an operation that
is not idempotent, and certainly not "cacheable", so again partial updates must either be done through POST or be
an independent resource in their own right.
A common use for POST is to create a resource that the service assigns the identity to. For example a service that
manages contacts might assign an immutable contact_id to each contact. This id is programmatically determined
by the service, the client has no way to know what the contact_id will be for the new contact. As such the client
cannot PUT the new contact as the URL of the contact will include the id, which is not yet known. A POST to the
same URL (leaving off the id) could be defined by the service to create a new instance of the resource and makes
its content be the body of the POST. This call would then return the new resource, or resource location, to the
caller.
->
PUT
creates or updates an entity resource. It is used when the caller knows the full name of the resource.
->
The value provided should be the full and exact representation you want returned by subsequent GETs. Some
kinds of entity resources might not support PUT.
DELETE
destroys an entity resource. It is used when the caller knows the full name of the resource.
->
Note that services are permitted to effect deletion by creating a tombstone version of the specified resource.
A workflow instance is a special case of entity that represents an active autonomous process within the system.
Such workflow instances have resource names, and their current state can be obtained by a client using GET. PUT
and DELETE might also be supported by workflow instances.
->
DELETE
destroys an entity resource via the recycler. The caller must know the name of the resource. Commented [ay36]: I understand the need of doing a POST
against a “factory” URL to create a new instance. However, I am not
sure about the necessity of do a DELETE against a “recycler”, unless
URI full name of the recycler resource
we are talking about deleting multiple resources here. Why not just
do a DELETE against the resource directly?
query parameters or body name (or characteristics) of the target resource
->
GET
returns status information about the factory or recycler. Support for GET for factory/recycler resources is optional. Commented [ay37]: Similarly, I am not sure the need of doing
a GET against a “factory” or “recycler”.
URI full name of the factory or recycler resource
->
A workflow engine resource is a special case of a factory/recycler resource. It supports the same HTTP methods,
but the resources it returns are workflow instances (see above).
->
Note that a single resource might function as both a collection resource and a factory/recycler resource. In this
case, GET would typically support collection query operation.
PUT
It is typically not appropriate to PUT (completely replace) a collection resource. If your application supports it, PUT
should be used as in the entity resource case.
->
GET
can may be used when the algorithmic resource is stateless, has no side-effects and returns the same resource for
any given input. This will make the results cachable. When using GET, the HTTP request MUST NOT contain a
body. Alternatively, GET can be used to return status information about the algorithmic resource itself. In this
case, GET should be used as in the entity resource case. Commented [ay38]: This use case may make the picture a bit
too muddy if we cannot think of more justification to allow such a
GET method usage.
Note that creating or updating a resource with PUT or POST does not always imply that a representation of that
version of the resource will be immediately available via GET. Eventual consistency and other considerations might
delay the availability of recently updated resources. Details of such behavior are service-specific. mparthas:
Usually, a 202 Accepted response code is returned to the user (with the location url containing the link for the
client to try at a later time] Commented [ay39]: Moved to comment section:
The HTTP specification further describes the behavior of the primary methods: Requests for safe HTTP methods do mparthas: Usually, a 202 Accepted response code is returned to the
user (with the location url containing the link for the client to try at
not change the state of resources on the server. GET and HEAD are intended to always be safe. Requests for a later time]
idempotent HTTP methods can be repeated without causing additional side-effects. GET, HEAD, PUT, and DELETE
are intended to always be idempotent. Service owners MUST conform to this behavior.
More details on these interaction patterns, including details on the returned HTTP status codes, can be found in
the document "HTTP Overview for REST-Style Interactions".
7 Protocol Semantics
7.1 Caching
HTTP 1.1’s caching constructs are powerful tools for optimizing RESTful interactions over HTTP.
Services should SHOULD take full advantage of HTTP 1.1 caching primitives for all GET requests including GET-style Commented [mwh40]: Did I go overboard on capitalization ?
requests that use the query-string field parameters. For simplicity and consistency, services should notSHOULD
NOT allow caching of PUT, POST, and DELETE operations by explicitly including a "Cache-Control: no-cache" in all
responses. Further, Cache-Control should SHOULD be "public". max-age: and max-stale should be avoided.
[mparthas: need to provide details on why these should be avoided]
For GET requests, services should SHOULD specify the Cache-Control, Last-Modified, ETag, and Vary headers in the
response. If the item can be cached, the server should SHOULD set a reasonable Expires time.
Servers must MUST support conditional GETs and return Not-Modified as appropriate. The ETag should SHOULD
be set such that it uniquely identifies a particular version of the resource for all references to that resource
independent of which individual host or service is vending that resource. If an ETag can not be guaranteed to be
the same for all instances of the same resource, then the ETag should notSHOULD NOT be used. The ETag should Commented [ay41]: What kind of “instances” are we referring
SHOULD be generated in a way that is cheap to generate and cheap to compare. For instance, at build time, a to here? Machine instances in a cluster environment?
checksum could be generated for read-only file resources and stored as the ETag value.
The Last-Modified header should SHOULD return the last time which the resource was actually modified in its
source form, and not the time at which its copy was most recently written to disk. Care should be taken to
preserve mtime on UNIX-based file resources across various deployment methods.
The Vary header must MUST be used to specify which header fields can invalidate a cached object when they
change. For instance, the Amazon Global Action Trace may MAY differ from request-to-request, but should
Services must notMUST NOT assume that client caches can be invalidated. The only guaranteed way to force a
refresh of an item with an Expires header set is to update any references to that content to point to a new URL for
that resource. Because of this, unless it is possible to update all references, Expires headers should notSHOULD
NOT exceed 5 minutes in the future before a revalidate is required. Commented [ay42]: Why “5 mins” particularly?
Clients should notSHOULD NOT override server cache-control headers. If a client does choose to override the
caching directives, the client MUST make proper use of the Client-Controlled-Behavior specification in section
13.1.6 of HTTP 1.1 including the cache-request-directives in Section 14.9.
Clients and services MUST identify the media type of the body of all HTTP requests and responses in the Content-
Type header.
Services MUST recognize the Accept, Accept-Charset, and Accept-Language HTTP request headers. The service
MUST respond with a representation that matches the requested encoding. If a service is unable to encode the
resource in a compatible representation, or accept a resource in the given representation, it MUST return an error.
TODO: OAuth Text Protection against replay attack would come from enforcing a
version-# (from a version-# series) or a transaction-# (e.g. GUID)
embedded in a request.
7.4 HTTP Redirect
Consider that one transfers some money from Bob’s account to
Jasper’s account. Hence, this message of setting Jesper’s balance to
7.5 SSL 324.64 happens. Then, some request transfer money from Jasper’s
account to John’s account and setting Jasper’s account down to
$224.64. Now, I replay an earlier message. Jasper’s account is now
TODO: Clean up. back to $324.64 (!!!).
Assuming both Bob’s and Jasper’s account are under the same
I can’t think of a single case where it is NOT OK to use SSL. Where it is OK to NOT use SSL is usually in requests that resource manager. ACID nature can be easily achieved. Protection
against replay-attack may be done by establishing a 2-phrase
contain no sensitive information, and that are not sensitive to replay or man-in-the-middle attacks. Generally, if
business protocol: (1) the client to ask the service to request unique
you need to use sessions at all, you are starting to get into a situation where SSL is important. If it is a valuable # (version # or transaction GUID # - the number series itself does
transaction, or you are transmitting sensitive data, you really need to use SSL. not need to have secure-random nature); (2) the client needs to
include this unique ID in the actual transfer request payload. This
unique ID is a part of data that is signed. Any replay of the same
It is typically not acceptable to skip SSL if you encrypt the payload or the headers. The reason is that SSL is session unique ID will be either simply ignored or replied with a failure
message.
based, and encrypts the entire session. It is much more difficult, therefore, to replay an SSL protected transaction
Commented [ay46]: We are explicitly against resource locking
than one that is protected within the headers. The one that is protected within the headers requires the bad guy to
in our overall REST design principle. We think about this kind of
only capture a single packet to replay in the worst case scenario. That is not going to be possible with SSL. issue further.
Furthermore, since SSL uses session keys once the session ends, the server’s ability to parse a replayed request is Commented [ay47]: Fill in the reasons behind why we are
lost, reducing the need for strict time synchronization. Strictly speaking, you can provide the same level of security using HTTP Redirect in the REST context. Also, mention the
restrictions that a service should enforce to avoid any security
with a custom protocol to sign and encrypt the headers, but by doing so you would end up re-inventing SSL, so you implication of HTTP redirect
may as well go with that in the first place.
Commented [mwh48]: I’m not sure how complete we’ll get
here. Could mention oauth, message signing and encryption (as
7.6 Cookies opposed to SSL), and the IAA project.
RESTful interactions are intended to be stateless. Cookies create shared state. Further, cookies are a server- Commented [ay49]: Personally I do not consider cookie exactly
initiated shared state that is transparent to the application. as a state sharing mechanism. I consider that as a client side state
storage mechanism. From service standpoint, request from a
cookie-capable client can be 100% stateless as well.
Because of this, HTTP cookies, while convenient, break the semantics of RESTful interaction. If a services chooses
My reservation about cookie is about: some HTTP client is not
to create a mechanism for shared state between requests it can do so as a resource (e.g. a session resource) . cookie capable (intentionally). It complicates the REST interaction
more (e.g. cookie-path, expiration, and security related
implication).
Most services, especially those internal to our systems, are passing resources that the clients will need to
understand in detail. As such the resource needs to be encoded in an understandable way. This requires a
serialization format that is machine independent, reasonably flexible, reasonably capable, and reasonably efficient.
Example serialization formats include XML, JSON, Ion, Biblio Records, and Tibco datagrams (aka BSF
Datagram/Dictionary).
The choice of the serialization format we offer to our outside customers is dictated for the most part by customer
requirements. That is for services that are our product to our customers
Typically this will be JSON for users who prefer REST interfaces and XML for those who prefer SOAP. A few years
ago other serializations would have been the "right" choice, perhaps DCOM, perhaps "Excel", perhaps CORBA or
ASN1. A few years from now another format will be the "right" choice. In general we will need to support
multiple serializations for our customers. And the "right" choice will be driven first by customer demand, and
second by technical merits.
Between Amazon owned services the issues are different in the technical considerations have much more Commented [mwh51]: This is a key justification for all of this,
important, in part because the sales aspect has a lower priority and issues like TCO (both hardware and wetware) not just for Ion. Be wordy here.
play a bigger role. Inside the firewall Ion is the preferred format.
application/<schema>-<format>
where
<schema> is the versioned Ion schema that defines the content body.
<format> is the serialization format being used - ion, ion-binary, ion-text, or JSON (or xml).
[cas: these constants need to be examined carefully, but the set of choices should be reasonably stable.]
by versioning the schema here we can often (but not always) avoid versioning the API itself.
NOTE: this interaction should be tested - how do intermediate caching tools handle the content-type and accepts
headers?
security related
cache control related Commented [ay55]: TODO: fill in more?
name: "com.amazon.EntityService",
major_version: 1,
minor_version: 0,
rest_http_uri: "EntityService",
entries: [
package:: {
name: businessObject,
major_version: 1,
minor_version: 0,
rest_http_uri: "business-object",
entries: [
type:: {
name: BusinessObject,
major_version: 1,
minor_version: 0,
type:: {
name: NotFoundException,
major_version: 1,
minor_version: 0,
base: struct,
fields: [
{
name: reason,
type: string
}
],
rest_http_error_code: 404
},
type:: {
name: UpdateFailedException,
major_version: 1,
minor_version: 0,
base: struct,
fields: [
{
name: reason,
type: string
}
],
rest_http_error_code: 409
},
operation:: {
name: get, Commented [ay58]: What if people using an operation name
major_version: 1,
beyond HTTP verbs “get”, “put”, “delete” and “post”? How do we
minor_version: 0,
describe RPC call in SDL in the context of REST?
in: string,
out: BusinessObject,
exceptions: [ NotFoundException ],
rest_http_method: get,
rest_http_uri_data: "{in}"
}
operation:: {
name: put,
major_version: 1,
minor_version: 0,
in: BusinessObject,
out: void,
exceptions: [ UpdateFailedException ],
rest_http_method: get,
rest_http_uri_data: "{in.id}", Commented [mwh59]: I am rethinking this stuff (syntax-wise).
rest_http_body_data: "{in}"
Might want to add a note that it’s for demo purposes only, and
}
subject to change.
] Commented [ay60]: Will “{in}” overwrite “{in.id}”?
}
The "rest_http_uri" fields are concatenated to produce the base URI path for an operation, in this case
"EntityService/business-object".
The SDL is not REST-specific. It supports a general model of operations that take one input value and produce one
output value. With REST, however, input data can come from several sources: trailing components of the URI,
query parameters, and the HTTP message body. The "rest_http_xxx_data" fields map these various inputs onto the
single operation "in" data value. The "{in.a.b.c}" syntax is a minimal path language specifying a location in the input Commented [ay61]: I am for this feature in general. But, be
data value. This syntax also permits multiple path components to individually contribute input data (e.g. careful of the slippery slope. And, we want to make sure this
minimal path language would be a clean subset of the general
"{in.merchantId}/{in.merchantSKU}"). Note: this example does not (yet) represent our suggested best practice for query language that we would define in future.
specifying service interfaces in SDL.
Note: the SDL's REST semantics are not currently fully specified. This will be rectified in the near future. Commented [ay62]: I think we need a formal SDL document
that describe what “rest_http_*_data” attributes means in SDL.
Note: currently the SDL is not integrated directly into our service frameworks. As such, use of SDL does not The description here serves as a primer or an example. That is fine.
But, we need more formal description, I think.
currently imply any particular run-time support.
Commented [ay63]: I think we can leave this negative
comments out for now.
All REST services at Amazon SHOULD export an SDL schema document at
"http://host:port/ServiceRootPath/interface". Commented [ay64]: Using “*Path” term would make it more
familiar to Java Servlet API users.
This SDL document should contain definitions for all types and operations understood by the current version
service, including all previous versions of those entities.
[mwh: if we had a central repository, we could return -only- the names of the top-level definitions. do we want to
prepare for such an eventuality?]
todo: returning type/ver for data (to look exactly like entity name versioning)
Single instances of a binary data object, such as a JPEG file or a block of raw data for encryption, SHOULD be
represented directly in the body of an HTTP message, using the appropriate “Content-Type” header.
Complex data structures containing binary data, for example a struct containing a few strings and one large blob,
can be problematic. Use of binary Ion is the preferred mechanism for accommodating this situation, since it
represents mixtures of various kinds of data, including blobs, with minimal space overhead. Text formats, such as
text Ion and JSON, would require encoding the binary data in a representation such as base64, which has high
space and processing overhead for large data objects. For simple, flat data structures, the “multipart/mixed”
media type MAY be used as an alternative to enable binary data to be efficiently encoded.
8.59.1 General Query semantics I am still re-organizing content under this big section.
Commented [mwh66]: Two sections here on Query, neither of
Per RFC 3986, a query forms part of a URI as follows:
which are Alex’s latest.
In general, Tthe query part of a URI exists to provide additional parameters for a resource request, in particular, in
the context of CGI requests where the path component identifies a CGI script, and the query parameters provides
input into that script.
Whilst the query parameters provide great flexibility, the implicit non-hierarchical nature of the query provides a
substantial drawback in that it prohibits trivial remapping of resources. In particular, RFC 3986 relative reference to
a URI considers a query to not be part of a relative reference.
A RESTful URI on the other hand is by intention, hierarchical in nature. For example,
“/weblogs/myweblog/entries/100” considers “100” to be most specific, and “weblogs” to be least specific.
Similarly, “/Universe/Earth/37.0,-95.2” considers the coordinates “37.0,-95.2” to be a more specific location than
Earth (but is specific to Earth).
A benefit to hierarchical URI’s for web services allows trivial redirection for specialization. For example, in the
context of a service such as Sable, valid RESTful URI’s include “item/v1/4/3551551677” to identify item scoped
data (“item”) in region “4”, with ASIN “3551551677”. The same entire fleet could choose to process data in region
“1”, or may choose to specialize at the region level of the path. Doing this specialization on a non-hierarchical Commented [ay69]: Where does this “1” come from? “v1”?
query such as “sable?scope=item;region=4;asin=3551551677” would be cumbersome and prone to errors. Commented [ay70]: Both choices are “region” levels?
There is however scenarios that a query does make sense. For example, the Product Aggregator Service applies Commented [ay71]: Chris Suver prefers something more
filters on its output data using a concept called “facets” to reduce what data is retrieved by a client. Likewise, there similar to “?page=3”. Allow the server side to control the size of the
page.
are a number of applications where it is desirable to paginate output, such as “?start=1;count=20”. Both of these
Commented [ay72]: It does not define a new entity resource.
fall into the same scope of database queries (i.e. they can be mapped to a SELECT statement and/or WHERE But, it does define a new query resource.
clause) and do not in themselves define a resource.
Commented [ay73]: This “MUST NOT” restriction is not so
practical. This kind of restriction does not exists in other parts of
Correct use of Query Parameters query world.
1. An individually identifiable resource MUST NOT be identified by a query parameter. (e.g. “select * from employee where empNo=123)
As a resource MUST be hierarchical in nature, it is not appropriate to fit it into a non-hierarchical At most, it is just a “SHOULD NOT”.
component of a URI
Alternatively, we could say: the resource, where the query targets
at, SHOULD not contain a data set which is too large. (i.e. a smaller
data set automatically c
1. A query parameter consists of “name=value” pairs, where both the name and value are individually
encoded using only the RFC 3986 “unreserved” characters. To avoid ambiguity, the “name” portion must Commented [ay78]:
start with an alphabetical character and consist of only the set of characters “a-z”, “A-Z”, “0-9”, “_”, “.” Why related to ambiguity? I guess it’s more related syntax
simplicity?
and “-”.
o Characters outside of the “unreserved” range in the query-string MUST cause the request to fail. Why allow white spaces? Why allowing “-“? The “-“character
usually gives some syntax in other programming languages later.
2. The “name=value” pairs are separated by “;”. And, why not “_”?
“;” MUST SHOULD be accepted and is desired per W3C recommendation. Commented [ay79]: If we may accept “&”, we can only say
“&” MAY be accepted to support traditional use “SHOULD” for “;”.
8.69.2 QUERYRich Query Syntax Commented [JH81]: This should be filled in with Alex’s latest
document
Most services have some form of query. At a minimum GET is a very specialized form.
list s/b defined by SDL - that is if your result may be a list it should defined as such, if it is always going to be a
singleton then it should not be a list.
Just as "list of" should be defined in the services SDL so should a URL as a value (and as distinct from pk's) - so we
need an SDL type or URL's.
straight GET - where the PK information is included in the URL and it retrieves a specific resource, while this is
often not thought of as query in the rest community, it certainly is when viewed from a database 'theory' point of
view, and shouldn't be overlooked at such.
GET with query parameters - here the base URL does not specify a singleton resource but a "root" or view (i.e.
table) of some kind. The query parameters define a filter (details should align with current use) that restricts the
return to a subset of the "multi-entity" resource.
and "real query" - this is the most general case, and should be embedded in a POST as a specific method, say ...
"QUERY".
[ note: One question to address is the query language. There is a long history of individual services offering
individual query langauges. General purpose query, such as SQL or XQuery is a bit hairy as they offer the ability to
write arbitarily expense operations. We should have some suggestions here and some alternate methods for
alternate languages. At a minimum we need an idiom for including the query language in the URL, the header, or
the SDL entry point definition. Including specifying "subsets" of various standard languages. ]
A variant on the full "query" is filter. This is a step above the query parameters, which really only offers "and" as a
conjuntion. It would be allow a more complete boolean predicate but not offer projection (changing the shape of
the returned contents) nor joins (self or otherwise, but where multiple tables or views are used to specify the
results). Joins in particular tend to generate expensive query plans and are more complicated to implement.
As well as a natural view of a resource, there can be other views of a resource. Characteristics of such views can
include:
Filtered View – only a subset of a view of a single resource is returned. This view may be filtered for, e.g.
performance reasons (reducing bandwidth) or security reasons (reducing visibility to sensitive fields).
Projection View – a single resource is projected onto a different shape. A natural projection of a view is to
handle different versions of a schema. It is possible that additional resources are joined in this projected
view (e.g. as done by Product Aggregator Service).
Such views may be pre-configured, or potentially may be driven by a customizable view mechanism. Commented [mwh82]: Some language about whether same or
different resource. That views have (or don’t have) schema. But
then what about dynamic views?
9.19.4 Definitions
This section serves to create working definitions for the discussion that follows.
Natural View
This is the most natural view of a resource, and would be equivalent to the SQL expression “SELECT * FROM
<table> WHERE <primary-key> = <resourceId>”.
The schema from this view will be considered the primary schema.
Filtered View
This is a reduction of the natural view, and would be equivalent to the SQL expression “SELECT field1, field2, …
FROM <table> WHERE <primary-key> = <resourceId>”.
This view uses the same primary schema as the natural view, therefore imposes the constraint that any fields that
only optional fields (per schema) may be removed by filtering. Commented [mwh83]: This seems excessively constraining.
Do we really want this?
Projected View
This is a view constructed from one or more natural views, and projected onto a different view. This view would
typically be described by a different schema. Joined views are included in this definition.
Pre-configured View
This is a filtered or projected view that is defined by the service owner (either by code or by configuration).
9.39.6 Amazon REST Approach Commented [ay86]: The title of this section is a bit too generic.
(Note: Is there any pre-existing means/header to specify the schema for PUT? Maybe mime type?)
The filter MAY be specified by application defined2 query parameters. For example, PAS defines a query parameter
“Facets” to allow a list of filters to be passed.
The filter MUST NOT be specified during PUT operations3. Commented [ay90]: Reconcile with “Mutability” paragraph
above.
The filter MAY have the schema version specified. That is, the filter is applied to a specific version of a schema. Commented [ay91]: These two statement is a bit under
specified. An example would be good.
Projection Formatted: Heading 4
Projection is defined as any/all of the following cases:
The view is an alternative composition of the natural view that cannot be described as a different (older,
newer) version of the natural view schema and/or does not have a simple functional relationship.
o This view may use the same primary key as the natural view.
o This view may contain multiple repeated instances of a single property from the natural view.
o This view may contain computation results based on the natural view.
o This view may (but need not) use a different primary key.
A projection MUST use a different namespace to the original resource. For example, if the original resource is
/a/b/c/<primary-key> then the projected view MUST NOT use the resource name /a/b/c/<primary-key> for its
projection or use this as the prefix of its projection.
The projection MAY be a different service whose purpose is to provide projections to the underlying resource. 4
Each different projection of a resource MUST use a different resource name to any other projection of a resource.5
A projection MUST follow the naming conventions to define the projection’s schema.
2
Should this be application defined? Can we be more specific? PAS uses the query parameter “Facets” to list a
series of top level fields/detail levels that results in fields being excluded. PAS itself returns a projection as part of
its service definition. This filter does not modify the projection, but rather filters it.
3
Should any query parameters ever be specified?
4
E.g. PAS
5
That is, should you identify projection A of a resource, then projection B can be considered a projection of A as
well as a projection of the original resource, and follow the same rules.
/projection/<projection-name>/a/b/c/<projection-primary-key>
Then it may be desirable to access the underlying projection as the following resource:
/projection/<projection-name>
Note: should we leave defining the format of the projection outside of scope of this working group?
10 Extended Use-Cases
10.1 Batching
Batching is a common technique for improving implementation efficiency and reducing latency for a series of
invocations in request/reply interactions. Batching involves sending multiple invocation requests in a single HTTP
request, and receiving the results of those invocations in the single corresponding HTTP response.
A client sends a batch request to a URL that supports batching. The URL will typically be the top-level name of the Commented [mwh92]: Did I say that? Do we agree on this?
target service. The HTTP method MUST be BATCH. If it is inconvenient to or impossible to use “BATCH” for the
HTTP method, the client can use POST with the “x-http-method-override: BATCH” header.
The body of the HTTP request MUST be an array sequence of structures individual complete HTTP request
messages that represent the individual requests. The body of the HTTP response MUST be an arraya sequence of
structures complete HTTP response messages that represent the individual responses, in the same order as their
corresponding requests. The HTTP response code of the enclosing response message should indicate success if the
batch is processed at all by the service. The status of individual requests are returned in the contained individual
replies. The URLs of the enclosed requests SHOULD refer to the same service that the batch is directed to. The Commented [mwh93]: MUST ?
content type of a batch HTTP request or reply MUST be “application/octet-string”, since the enclosed messages
can contain arbitrary data. The content-length of a batch HTTP request or reply MUST be the sum of the total
length (i.e. including headers) of the individual messages.
The content type of a batch HTTP request MUST be one of “text/json”, “text/x-amzn-ion”, or “application/x-amzn/-
ion”. The following combinations of batch request content type and embedded request content type are
permitted:
The individual requests MUST contain "url", "method", and "headers" fields and MAY contain a "body" field. The
individual responses MUST contain "status", "reason", and "headers" fields and MAY contain a "body" field. The
URL of every individual request MUST be a relative URI. That URI is appended to the URI of the HTTP request.
The individual requests do not need to share any characteristics; they are completely independent. No individual
request headers are inherited from the main HTTP request.
The individual requests in a batch can be executed by the service in any order, or in parallel. The batch mechanism
does not imply any kind of transactional or all-or-nothing semantics.
BATCH http://someHost:1234/MyService
Content-Type: text/jsonaplication/octet-string
Content-Length: ...
[
{
"url": "/order/5678",
"method": "PUT",
"headers": [ "content-type: text/json", ... ],
"body": { "field1": "data1", ... }
},
{
"url": "/order/5678",
"method": "GET",
"headers": [ "accept: text/json", ... ]
},
...
]
HTTP/1.0 200 OK
...
There are two other techniques that can also be used to address the same issues.
The client can also open several connections to the service and execute calls in parallel. One disadvantage of this
is increased resource utilization on both the client and server (i.e. sockets). Another disadvantage is the increased
complexity of managing multiple connections, although this might be reduced by library code. In fact, with a
sufficiently sophisticated client, the batching protocol above could be automatically transformed into parallel calls
using multiple connections.
Streaming of HTTP requests would have been the preferred alternative over both batching and multiple
connections, but it is not well supported by existing tools and libraries. Supporting streaming becomes especially
problematic with intermediate caches and proxies, basically any process that could divide the stream of requests
to multiple destinations. Most such intermediate components simply do not support HTTP streaming.
The first is a PUT (or POST) to the resource URL where the body contains only a portion of the actual resource. For
example POSTing just the zip code. The idea is that only the data in the body would be applied to the identified
resource. (the resource identified by the URL). In reality this is the sort update most merchants apply on item
data, but the pipeline has a fair amount of code, policy, and metadata to drive the reconciliation of the partial
update and the full existing item.
The second form allows PUTting (typically) to a URL that "drills into" the resource. For example PUT to
.../customer/123/address/zip_code, and again where the body contains only the zip code value. This is very
appealing as it treats the hierarchy of the resource as if this were the hierarchy of a file system.
A big problem with both of these is that you encounter significant coordination issues in a real environment. How
do you handle concurrent updates? When the updates are in conflict - say two requests come in to update the zip Commented [mwh94]: At the time I was reading through this
section, it felt a bit “chatty”. The reader isn’t going to answer us
here, nor will we hear it if they do.
We allow POST and we can offer guidance on how to handle smaller focused updates. But we should make that
more work. In particular we should force the service owner who wishes to offer partial update to fully define the
behavior in the face of concurrent updates.
Service owners are certainly welcome to expose resources which are partial views of some data - such as just the
contact information about a customer. And these could quite reasonably be updatable. And this should be done
with careful consideration.
Service owners should be encouraged to offer some form of partial update as an explicit extended operation.
(ouch ouch, that sounds like RPC) Again careful consideration is required. As an example our Merchants updates
are really the Merchant submitting (PUTting) a contribution to the item. The PUT of this sort of resource triggers a
workflow that processes the contribution and adds the Merchants properties to the item (or not if we already have
better data, for our definition of better).
Reader/Editor Note:
(1) RFC’s keywords (such as “MUST”, “SHOULD”, “MAY”) are used in this document. They have normative
implications. Commented [mwh95]: Move this note to the top somewhere
?
(2) In this proposal, we assume that there is a URL to point the service itself. E.g.:
http://someHost:1234/myFooService .
We also assume there is a URL to point to a resource maintained by the service. E.g.:
http://someHost:1234/myFooService/pathToResourceX .
The URL (physical) vs URI (logical) usage consideration would be addressed in a separate section of the
document.
However, a general business services might have other operations which that do not fit into this paradigm. Those
operations are typically related to performing an action or an algorithm. Those operations might be viewed as a
resource itself. To make this document more concise, “algorithmic resource” or “action resource” will be used as Commented [mwh96]: I’m struggling with this here. We
the terminologies to describe such operations thereafter. already have “algorithmic resource” (a noun) covered way up top.
This section mixes the nouns with the verbs, and calls it all RPC.
Should we limit RPC to be just verbs? And then the distinguishing
Some of these operations are not tied to a particular data-centric resource. For example, consider a service which characteristics is that “operation” is specified somewhere (only in
manages orders as its data resource. This service might have a currency conversion operation (algorithm) which is the query parameters? Never in the uri?).
not tied to a particular order document. On the other hand, some of these operations can be based on a particular
data resource instance. The same example ordering related service might have an operation to calculate the
Reader/Editor Note:
The operation name MUST contain alphanumeric characters and underscore only, It MUST start with an alphabet
character.
The regular expression for a valid operation name is: [a-zA-Z][a-zA-Z0-9_]* Commented [mwh97]: Should we define “identifier”
somewhere more global?
Reader/Editor Note:
An alternative design is: putting the operation name as HTTP header (e.g. “x-amzn-operation”)
Reasons to use suggest headers instead of URL to denote the operation name are: we have not nailed down
details of resource-naming scheme and query mechanism yet; using header would allow more flexibility to
innovate in that space.
Resource-based Method
When an Algorithmic Operation requires a Resource to complete, the target URL of the POST request SHOULD
refer to the corresponding Resource instance and followed by the operation identification segment. An example of
this kind of operation is to calculate the shipping cost of an order. An example of the URL for the POST request
refers to that particular order is:
http://someHost:1234/myOrderService/pathToOrder5678;operation=sumbitOrder
Service-based Method
When a Non-Standard-Resource Operation does not require a resource to complete, the target URL of the POST
request SHOULD refer to the corresponding Service and followed by the operation identification segment. An
example of this kind of operation is to convert currency. An example of the target URL is:
http://someHost:1234/myOrderService;operation=convertCurrency
Formatted: Font: 8 pt, No underline, Font color: Auto
An example of a HTTP Request is:
Formatted: Code Sample, Indent: Left: 0.5"
POST http://someHost:1234/MmyOrderService;operation=convertCurrency Formatted: Font: 8 pt
Content-Type: ...
Content-Length: ... Formatted: Font: 8 pt, No underline, Font color: Auto
{
"convertCurrencyRequest" : { Formatted: Font: 8 pt, No underline, Font color: Auto
"currencyCode": "USD", Formatted: Font: 8 pt, No underline, Font color: Auto
"currenyAmt": 56.00,
"targetCurrencyCode": "EUR" Formatted: Font: 8 pt, No underline, Font color: Auto
Formatted: Font: 8 pt, No underline, Font color: Auto
Error Responses
HTTP status code: [TODO: to determine what numeric values should be used for “XX” and to determine whether
we can add some extra system level error here; to merge with this section with generic status code description]
5XX-5XX, if a system level error occurs (e.g. parsing error of input payload)
5XX-5XX, if a business level error occurs (e.g. an input argument is out of range as specified by the
business logic
The HTTP Response Body MAY contain details of an error, which are expressed in one of data formats (such as, Ion,
JSON and XML) supported by this specification. In such a case, “Content-Type” MUST be set accordingly.
Certain client<->server interactions are event-driven, meaning that the information that the client is interested in
will arrive at an undetermined time in the future. For instance, an AJAX application that implements instant
messaging is waiting for the arrival of new instant messages on behalf of the user.
In this case, clients should initiate the connection to the server using a GET request and wait a set time threshold in
the range of 30seconds-5minutes before terminating the existing request and initiating a new request. Clients
should be using persistent connections to avoid unnecessary TCP startup costs.
Servers should accept the request and keep it open for a reasonable amount of time. If no message arrives the
10.5 Workflows
I believe we will need to, and I have not yet, separated automated workflows from those that contain 1 (or
more) HIT's (tasks that a human is responsible for). For the moment I am "ignoring" those workflows (and
workflow systems) that have activities which humans perform.
A workflow is a requested operation that make may take a long time and is processed asynchronously from the
request that starts it. This is distinct from a typical REST method which operates essentially synchronously that is
the request is complete when the result is returned. A request that is handled asynchronously is an intermediate
form. If the operation can be called synchronously then it is not a workflow (even if it is handled by a workflow).
A PUT or POST request is used to start a workflow. PUT if user knows ID, POST if not.
Workflows generally, but not universally, start with an "insert". Dropping off a request to do work. Examples of
this include merchant feeds where an entire file is dropped off for processing, item data where a merchant SKU
keyed datagram is put to the "front end" of the item pipeline, and the buy button where a shopping cart starts the
process.
The result of this call will include, in the body, a token that can be used to reference the state of the workflow.
This MUST include the status of at least IN-PROGRESS, FAIL, SUCCESS, PARTIAL-SUCCESS. It may include a progress
indicator. It should (always, but not MUST) return appropriate error information when failures (full or partial)
occur. These MUST include any identifier to the entity associated with the failure, the failure code (for example
the exception), and any additional information necessary for the caller to take appropriate action.
10.6 Extending HTTP Methods Commented [mwh98]: Can this just go away ?
Services may offer extended methods. These should be either included in the HTTP header <put exact header
here> or access through batch.
A notable example is partial update of a resource. Perhaps as the method MERGE <any other suggestions for this?
Again it is common is should be constant.>.
Another example is the "factory" methods - append, where the caller doesn't know the key, the service assigns it.
Or data conversion, such as upload a bmp but the resource is the equivalent jpg. These patterns need examples
and naming conventions.
Data item - A Data Item is a unit of data, which is either a scalar value (e.g. an integer or a string) or a container
which holds zero or more data items (e.g. an IonStruct or a JSON Array).
Structure - A Data Item that has a collection of named fields. Examples are an IonStruct, a JSON Object and an XML
Element
Sequence (of Data Items) - A Sequence of Date Items is an ordered collection of zero or more Data Items. The size Commented [mwh99]: Standardize on “list” ?
of a Sequence MAY be unknown. A Sequence is an abstract data model concept for REST computing (particularly
REST-Query related). A Sequence might be represented by an on-the-wire format (e.g. a JSON array) in some cases,
while another Sequence might not be represented effectively or correctly by an on-the-wire format in some other
cases (e.g. a JSON array would not be able to represent a Sequence of unknown size effectively). Sequences never
contain other sequences. If Sequences are combined, the result is always a “flattened” Sequence. This flattened Commented [mwh100]: No. No need to imply that lists can’t
nature is to simplify the data model for REST computing. [Note: a JSON array or an Ion List is NOT identical to a contain other lists. “combine” is semantically imprecise.
Sequence.]
Resource - A Resource is a data oriented service component made available to users to perform actions on and it
MUST be always identified by a URL. Possible actions to a resource are retrieval, mutation or invocation. Actions
are performed through one of the HTTP methods, e.g. GET/PUT/DELETE/POST. During mutation and invocation,
the HTTP Request Body MAY have zero or one Data Item or a Sequence of Data Items as its input. During retrieval,
mutation and invocation, the HTTP Response Body MAY have zero or one Data Item or a Sequence of Data Items as
its output. Resources can be further divided into different sub-categories: entity resources, algorithmic resources,
query resources.
REST Service - A REST Service is a service that manages a collection of resources through HTTP protocols and MUST
be always identified by a URL (a.k.a. Service URL).
Entity Resource - An Entity Resource is a resource: 1) that accepts HTTP GET/PUT/DELETE methods; 2) that
represents a state which lives beyond the duration of the HTTP methods. The state of an Entity Resource is
represented by a single Structure or a Sequence of Structures. The state of an Entity Resource can be retrieved by
HTTP GET and can be mutated by HTTP PUT/DELETE. While some resources return a Sequence of Data Items, the
ordering of Data Items MAY be unspecified in some cases. [Open Issue: An Entity Resource MAY accept POST for
other operation purposes, which may be non-idempotent, such as partial update operations.] An Entity Resource
URL is formed by adding one or more non-empty relative URL path-segments to the Service URL.
Entity Resource Hierarchy - Individual Entity Resources managed by a REST service SHOULD be organized in a Commented [mwh102]: If we mean this, the topic should be
hierarchy of Collections (i.e. Entity Resource Collections) at the service designer’s discretion. All Individual Entity discussed above. This section isn’t a “definition” for a glossary.
Resources in a Collection MUST share the same scalar value of a particular named field. All Individual Entity
Resources in a Collection (i.e. the parent Collection) can be further divided and organized into child-Collections
(hence, the notion of hierarchy). Individual Entity Resources in a child Collection will share another scalar value of
an additional named field.
Each Collection in the hierarchy is identified by a URL. The URL is formed by adding path-segments to the Service
URL. Each path-segment represents a shared scalar value in a collection in the hierarchy. The ordering of the path-
segments added to the URL corresponds to the hierarchical Collection ordering, from parent to child.
A REST Service has discretion to reject HTTP requests to an Entity Resource Collection. An example scenario is: a
HTTP GET request to an Entity Resource Collection is rejected because the Collection is too expensive to compute.
[Open Issue: From Chris: does the shared value have to be a scalar? From Alex: gut feelings says: we can expand to
non-scalar data object, as long as the service provides a clear “toString()” and “equal()”semantics on those non-
scalar data object.]
When the URL refers to a collection of Entity Resources, the trailing “/” is insignificant to achieve web browser
friendly behavior. For example, these URLs are identical in the context of Entity Resources:
http://host:port/MyFooService/US/Electronics/ and http://host:port/MyFooService/US/Electronics
Algorithmic Resource - An Algorithm Resource is a resource that accepts HTTP POST and GET methods only. The
URL of an Algorithmic Resource is formed by adding the name of algorithm (a.k.a. operation) as a parameter to an
Entity Resource URL or a Service URL. HTTP GET SHOULD be used only when the operation does not create any
user visible side effects (which also implies the operation is idempotent).
Query Resource – A Query Resource is a resource where a Query is applied to a resource as its input. A Query Commented [mwh103]: I didn’t distinguish this above. IMHO
Resource can be used an input to another Query Resource. A Query Resource itself is READ-ONLY. A Query “collection” covers it, and “can throw a query at it” isn’t quite as
useful. Or not.
Resource accepts HTTP GET only if the input resource accepts HTTP GET method (e.g. an Entity Resource).
Alternatively, it accepts HTTP POST method only if the input resource accepts HTTP POST method (e.g. an
algorithmic resource). A Service MAY reject requests to a Query, if it deems the Query is too expensive to
compute. [Open Issue: URL Syntax TBD.]
Query - …
OLD TEXT
FILE UPLOAD
File upload is a common operation in that many Amazon services take large files in from their users. Examples
include the Digital team getting original digital content and Merchant services accepting feeds with Item data.
When the inbound file is itself a resource, that is something you can GET and one that the submitter knows the
identifying key, PUT is the appropriate method. In other cases this would be handled using a POST, with a suitable
alternate method.
NOTE: partial update is an example of an operation that needs its own method. Should we define this? It's not
"POST".
a simple resource
The basic use case is that the developer wants to make the management of a single resource available to a "public"
audience. While this is nearly a "toy" example it is the basic foundation and we have a large number of examples
of this.
<tbd>
<tbd>
publishing a report
<tbd>
extended methods
o will we have them? - yes
o how to handle them - tbd, current proposal is through “programmatic resources”
metadata access - what to offer (or require), where it lives
o schema language for shape - SDL schema
o JSON vs Ion vs XML vs BSF – JSON for public, Ion internal
o language for API – SDL
o what about other policies, like security, sla's, etc
URL use - keys, query parameter, other tokens
o sub document addressing (continued from last week)
o programmatic resources
query parameter use
query in general
http header use
o especially content type (et al)
o security
o context tracking
o etags
o cache control
o (can we redirect a POST to a PUT?)
cookies (or should they get their own line item?)
error code use
API definition on client (in general)
API definition on server (in general)
What is a resource?
My service interface doesn't have any resources, how could I possibly use REST to expose it?
My browser (or the one I need to support) doesn't support PUT, what do I do now?
How do I do security?
PUT to IMSv2/Contributions/ACME/DOLL
Document:
Item:: {
merchant:ACME,
sku:DOLL,
listing::{ … },
product:: {
description:”a doll”,
…
},
}
Returns ok / accepted
MUST be able to perform GET on IMSv2/Contributions/ACME/DOLL – returns status or result MUST be idempotent
– i.e. PUT to a user known key, PUTable resources are a strict subset of GETable resources.
POST- eg
POST to /Feeds/SFF/ACME
Can have alternative ways of retrieving data, e.g. retrieving side-effect data such as ASIN
RCAT_OFFERS/mk/asin
12.3 Questions:
For each arc, is it a get? Put? Post? How to decide?
How to do async?
When returning a list of items – JSON – would use an array to represent the list (JavaScript constraints)
Steps
o Get listings
o Choose listing
o Check prices
• Reduce Availability
• Fulfillment confirmation
• Charge visa
Find Items:
POST?
GET? /ItemSearch/<MARKETPLACE>/
?
BrowseNode=137;
Keyword=doll;
Keyword=new;
Returns back ASIN’s/ other data
{
…
asins:[Bxxxxxxxxx, Byy… ]
}
Or
Header
{
some status
{ asin:Bxxxxxxxxx}
{ asin:Byy…}
}
???
GET vs POST ?
o Header must include cache-ability parameters as Proxies/caches will (may) consider GET with
query parameters uncacheable.
Order independence?
As we describe how resources are named, (LHS and RHS), we should consider using the terminology from RFC3986,
“URI Generic Syntax”:
Abstract
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or
physical resource. This specification defines the generic URI syntax and a process for resolving URI
references that might be in relative form, along with guidelines and security considerations for the use of
URIs on the Internet. The URI syntax defines a grammar that is a superset of all valid URIs, allowing an
implementation to parse the common components of a URI reference without knowing the scheme-
specific requirements of every possible identifier. This specification does not define a generative
grammar for URIs; that task is performed by the individual specifications of each URI scheme.
Some more data for the RPA use case (which will influence PAS REST interface):
Some of the parameters for RPA are an obvious choice to qualify the resource (e.g. MarketplaceId, MerchantId,
ASIN)
Some parameters control the subset of data that is to be returned (e.g. prefetch list / flavors in RPA, "facets" in
PAS)
Some are data qualifiers / obvious query parameters (e.g. "PreferMerchantImages", "CustomerIsPrime",
"UseFMAv3")
If a batch of 20 ASIN's are provided, then this allows RPA to perform, e.g., 40 service calls as opposed to 800
service calls.
These are the entities your service understands, and usually controls. They have a name and some form of
identifier. The URL of a resource must include the service name, the resource name and all the key parts needed to
identify the resource.
key parts - the resource is identified by a key. The key may have one or more logical fields. Marketplace and ASIN
are an example of a two part key. Each of the key parts should
The key parts mimic a directory tree. As should consideration to their order is important. It may also be useful to
support a specialized form of query through partial key specification. This would return the sub-tree, subject to
any filters specified in using URL query parameters. [cas: this should be the only use of query parameters.]