Skip to content

Conversation

@bzanchet
Copy link

When listing objects within a bucket the server returns the etag, last-modified and size values for each object. The SDK simply ignores all those values when creating instances of S3Object and if they're needed it makes a HEAD request (one per object).

I didn't want to change the interface too much-- maybe using the options hash on S3Object isn't the best choice, but consider this patch as more of a conversation starter than anything.

I've been using my fork now and it saves up a lot of requests.

Any thoughts?

@trevorrowe
Copy link
Contributor

I am generally very in favor of this change, as this has been a long-standing issue. The primary issue is this is a non-backwards compatible change. For better or for worse, the existing implementation always calls #head_object and returns fresh/current values.

Memoizing this values generally makes sense, but can introduce bugs in code that is relying on the non-static values. For example, I might write a script that is polling for a change to last_modified, expecting some other process to change the data. If I updated SDKs and then my script broke, I would be very unhappy.

Given the SDK follows semver, users locked on version 1.x.y should be able to update within 1.x without breaking changes. I would be in favor of this change, if the default behavior could be maintained, and memoized attributes were opt-in. For example:

s3 = AWS::S3.new(s3_cache_object_attributes: true)
s3.bucket['aws-sdk'].objects.each |obj|
  # look ma, no head request!
  puts obj.key + ' => ' + obj.etag
end

This could be accomplished by registering (in lib/aws/s3/config.rb) a new configuration option and then using client.config.s3_cache_object_attributes to determine when to refresh or not.

As a side note, the v2 SDK has a very early preview of the S3 object interface available. V2 of the SDK works as expected out of the box and memoizes all resource data by default.

s3 = Aws::S3::Resource.new
s3.bucket('aws-sdk').objects.each do |obj|
  # look ma, no head request!
  puts obj.key + ' => ' + obj.etag
end

The v2 aws-sdk-core gem is available now as a preview release, I'll be cutting preview releases of the aws-sdk-resources and aws-sdk gems shortly. The github project is here: https://github.com/aws/aws-sdk-core-ruby

@bzanchet
Copy link
Author

Hey Trevor,

thanks so much for your quick feedback. I agree that this patch isn't backwards compatible and that's a problem - one simplistic solution would be to make "force_refresh" default to true - but it felt too hacky even for scratching my own itch :)

Anyway, I like the idea of adding a new configuration - would you say it's worth doing? It shouldn't take more than a couple hours to make this patch rely on a new configuration and be 100% backwards compatible. Or should I and everybody else just start using the RC of the new version?

Best!
Bruno

@trevorrowe
Copy link
Contributor

Version 2 is different enough from v1 that is it not a drop-in replacement in a number of places. That said, they use different namespaces, so you can you use both gems in the same project. The lastest v1 release, 1.52.0 is available as the aws-sdk gem and the aws-sdk-v1 gem. This ensures you will be able to install aws-sdk ~> 2 and aws-sdk-v1 with only minimal Gemfile updates.

To answer your other question, I would be willing to merge this pull request given we can ensure backwards compatibility and the configuration option would be sufficiently simple enough for users to opt-in to use this feature. I have stopped feature work on v1, so I likely wont tackle this, but we welcome community contributions, and this seems like a good fit.

If you would find it useful enough to stop using your fork and have this merged into mainline, I'd be happy to see that happen.

@bzanchet
Copy link
Author

Hey Trevor, there you go - took less than I expected.

I sticked with the configuration name you mentioned-- can you think of a better one? Also removed the "force_refresh" option for simplicity sake. I guess one can just create a new instance of the object if they want to refresh its attributes (Rails has a "reload" method on ActiveRecord instances-- I think if we're going down that path, a full "reload" - which would just return a new, empty instance of the same S3Object - is probably simpler/better than having a "force_refresh" in each one of the attributes.

Looks better?

@bzanchet
Copy link
Author

Also-- do you want me to squash all the commits? I couldn't find a way to do that without having to close this PR and creating a new one.
And a neat tick: it's easier too look at the changes if you add "?w=1" on the github URL - does the same as the -w option on git (as far as I know this isn't documented anywhere); because my editor removes all trailing spaces and messes up the raw diff.

@bzanchet
Copy link
Author

Hello?

@trevorrowe
Copy link
Contributor

Sorry, this fell off my radar. Looks good to me.

trevorrowe added a commit that referenced this pull request Sep 12, 2014
memoizes data available on the ListBucket request
@trevorrowe trevorrowe merged commit 5274c65 into aws:master Sep 12, 2014
trevorrowe added a commit that referenced this pull request Sep 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants