memoizes data available on the ListBucket request #610

bzanchet · 2014-08-27T17:22:56Z

When listing objects within a bucket the server returns the etag, last-modified and size values for each object. The SDK simply ignores all those values when creating instances of S3Object and if they're needed it makes a HEAD request (one per object).

I didn't want to change the interface too much-- maybe using the options hash on S3Object isn't the best choice, but consider this patch as more of a conversation starter than anything.

I've been using my fork now and it saves up a lot of requests.

Any thoughts?

… s3object

…n s3object

…bject_collection

trevorrowe · 2014-08-27T18:34:32Z

I am generally very in favor of this change, as this has been a long-standing issue. The primary issue is this is a non-backwards compatible change. For better or for worse, the existing implementation always calls #head_object and returns fresh/current values.

Memoizing this values generally makes sense, but can introduce bugs in code that is relying on the non-static values. For example, I might write a script that is polling for a change to last_modified, expecting some other process to change the data. If I updated SDKs and then my script broke, I would be very unhappy.

Given the SDK follows semver, users locked on version 1.x.y should be able to update within 1.x without breaking changes. I would be in favor of this change, if the default behavior could be maintained, and memoized attributes were opt-in. For example:

s3 = AWS::S3.new(s3_cache_object_attributes: true)
s3.bucket['aws-sdk'].objects.each |obj|
  # look ma, no head request!
  puts obj.key + ' => ' + obj.etag
end

This could be accomplished by registering (in lib/aws/s3/config.rb) a new configuration option and then using client.config.s3_cache_object_attributes to determine when to refresh or not.

As a side note, the v2 SDK has a very early preview of the S3 object interface available. V2 of the SDK works as expected out of the box and memoizes all resource data by default.

s3 = Aws::S3::Resource.new
s3.bucket('aws-sdk').objects.each do |obj|
  # look ma, no head request!
  puts obj.key + ' => ' + obj.etag
end

The v2 aws-sdk-core gem is available now as a preview release, I'll be cutting preview releases of the aws-sdk-resources and aws-sdk gems shortly. The github project is here: https://github.com/aws/aws-sdk-core-ruby

bzanchet · 2014-08-27T20:59:17Z

Hey Trevor,

thanks so much for your quick feedback. I agree that this patch isn't backwards compatible and that's a problem - one simplistic solution would be to make "force_refresh" default to true - but it felt too hacky even for scratching my own itch :)

Anyway, I like the idea of adding a new configuration - would you say it's worth doing? It shouldn't take more than a couple hours to make this patch rely on a new configuration and be 100% backwards compatible. Or should I and everybody else just start using the RC of the new version?

Best!
Bruno

trevorrowe · 2014-08-27T21:17:49Z

Version 2 is different enough from v1 that is it not a drop-in replacement in a number of places. That said, they use different namespaces, so you can you use both gems in the same project. The lastest v1 release, 1.52.0 is available as the aws-sdk gem and the aws-sdk-v1 gem. This ensures you will be able to install aws-sdk ~> 2 and aws-sdk-v1 with only minimal Gemfile updates.

To answer your other question, I would be willing to merge this pull request given we can ensure backwards compatibility and the configuration option would be sufficiently simple enough for users to opt-in to use this feature. I have stopped feature work on v1, so I likely wont tackle this, but we welcome community contributions, and this seems like a good fit.

If you would find it useful enough to stop using your fork and have this merged into mainline, I'd be happy to see that happen.

bzanchet · 2014-08-28T02:12:55Z

Hey Trevor, there you go - took less than I expected.

I sticked with the configuration name you mentioned-- can you think of a better one? Also removed the "force_refresh" option for simplicity sake. I guess one can just create a new instance of the object if they want to refresh its attributes (Rails has a "reload" method on ActiveRecord instances-- I think if we're going down that path, a full "reload" - which would just return a new, empty instance of the same S3Object - is probably simpler/better than having a "force_refresh" in each one of the attributes.

Looks better?

bzanchet · 2014-08-28T02:17:20Z

Also-- do you want me to squash all the commits? I couldn't find a way to do that without having to close this PR and creating a new one.
And a neat tick: it's easier too look at the changes if you add "?w=1" on the github URL - does the same as the -w option on git (as far as I know this isn't documented anywhere); because my editor removes all trailing spaces and messes up the raw diff.

bzanchet · 2014-09-12T19:14:18Z

Hello?

trevorrowe · 2014-09-12T19:16:19Z

Sorry, this fell off my radar. Looks good to me.

memoizes data available on the ListBucket request

References: #604, #610, #627, #630, #631, #633

bzanchet added 4 commits August 25, 2014 16:00

memoizing etag, content-length and last-modified from list request on…

89224ac

… s3object

adding tests for etag, content_length and last_modified memoization o…

4ea7b7f

…n s3object

parsing data when memoizing content_length and last_modified on s3::o…

f2b5015

…bject_collection

parsing data when memoizing content_length and last_modified on s3::o…

64568f0

…bject_collection

bzanchet added 2 commits August 27, 2014 22:58

adding configuration for s3 to cache object attributes from ListBucket

fea968b

last_modified is already a Time instance

488a559

trevorrowe added a commit that referenced this pull request Sep 12, 2014

Merge pull request #610 from bzanchet/master

5274c65

memoizes data available on the ListBucket request

trevorrowe merged commit 5274c65 into aws:master Sep 12, 2014

trevorrowe added a commit that referenced this pull request Sep 26, 2014

Tag release v1.54.0

b2da587

References: #604, #610, #627, #630, #631, #633

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memoizes data available on the ListBucket request #610

memoizes data available on the ListBucket request #610

Uh oh!

bzanchet commented Aug 27, 2014

Uh oh!

trevorrowe commented Aug 27, 2014

Uh oh!

bzanchet commented Aug 27, 2014

Uh oh!

trevorrowe commented Aug 27, 2014

Uh oh!

bzanchet commented Aug 28, 2014

Uh oh!

bzanchet commented Aug 28, 2014

Uh oh!

bzanchet commented Sep 12, 2014

Uh oh!

trevorrowe commented Sep 12, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

memoizes data available on the ListBucket request #610

memoizes data available on the ListBucket request #610

Uh oh!

Conversation

bzanchet commented Aug 27, 2014

Uh oh!

trevorrowe commented Aug 27, 2014

Uh oh!

bzanchet commented Aug 27, 2014

Uh oh!

trevorrowe commented Aug 27, 2014

Uh oh!

bzanchet commented Aug 28, 2014

Uh oh!

bzanchet commented Aug 28, 2014

Uh oh!

bzanchet commented Sep 12, 2014

Uh oh!

trevorrowe commented Sep 12, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants