Whats in Amazon's buckets?

Wed 25th May 11

While catching up on some old Hak5 episodes I found the piece on Amazon's S3 storage. If you don't know what S3 is then I recommend going and watching the episode, it gives a good introduction and was all I'd had before starting this project. The thing that caught my eye, and Darren's, was when Jason mentioned that each bucket has to have a unique name across the whole of the S3 system, as soon as I heard that I was thinking lets bruteforce some bucket names.

So I signed up for the free tier and started investigating. I created a couple of buckets and looked at the options, by default a bucket is private and only accessible by the owner but you can add new permissions which make the bucket publicly accessible. I made one bucket private, one public then hit their URLs to see what would happen, this is what I got back:

Private bucket

<Error>
	<Code>AccessDenied</Code>
	<Message>Access Denied</Message>
	<RequestId>7F3987394757439B</RequestId>
	<HostId>kyMIhkpoWafjruFFairkfim383jtznAnwiyKSTxv7+/CIHqMBcqrXV2gr+EuALUp</HostId>
</Error>

Public bucket

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
	<Name>digipublic</Name>
	<Prefix></Prefix>
	<Marker></Marker>
	<MaxKeys>1000</MaxKeys>
	<IsTruncated>false</IsTruncated>
</ListBucketResult>

There is an obvious difference between the two so that will be easy to test for in a script. The next thing I looked at was the region. When you set a bucket up you can specify which of the five data centres the data is stored in so your data is closer to your target audience. You get the following options:

  • US Standard
  • Ireland
  • Northern California
  • Singapore
  • Tokyo

So I setup a bucket in each and accessed them all, the difference when accessing them is the hostname, this is the mapping:

  • US Standard = http://s3.amazonaws.com
  • Ireland = http://s3-eu-west-1.amazonaws.com
  • Northern California = http://s3-us-west-1.amazonaws.com
  • Singapore = http://s3-ap-southeast-1.amazonaws.com
  • Tokyo = http://s3-ap-northeast-1.amazonaws.com

But as the bucket names have to be unique across the whole of S3 what happens if you access a bucket in Tokyo with the hostname for Ireland?

<Error>
	<Code>PermanentRedirect</Code>
	<Message>
		The bucket you are attempting to access must be addressed using the
		specified endpoint. Please send all future requests to this endpoint.
	</Message>
	<RequestId>4834475949AFC737</RequestId>
	<Bucket>digitokyo</Bucket>
	<HostId>TC1DCxcxiejfiek33492034AqtEVBxr+1Oj0GJvmCktGVrlcdZz9YjX5wHMbITi2</HostId>
	<Endpoint>digitokyo.s3-ap-northeast-1.amazonaws.com</Endpoint>
</Error>

They kindly redirect you to the correct hostname.

With all this info I built up a script which would take a word list and run through it trying to access a bucket for each word, it nicely parsed out the returned XML, followed redirections and resulted in a list showing public, private and unassigned buckets.

That was good, but what about files? I put some files in my public bucket and hit its URL:

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
	<Name>digipublic</Name>
	<Prefix></Prefix>
	<Marker></Marker>
	<MaxKeys>1000</MaxKeys>
	<IsTruncated>false</IsTruncated>
	<Contents>
		<Key>my_file</Key>
		<LastModified>2011-05-16T10:47:16.000Z</LastModified>
		<ETag>"51fff3c9087648822c0a21212907934a"</ETag>
		<Size>6429</Size>
		<StorageClass>STANDARD</StorageClass>
	</Contents>
</ListBucketResult>

That is a directory listing, that is good!

I put some more files in, some private and some public and they all showed up in the list. Trying to access private files though resulted in a "403 Forbidden" being returned and a bunch of XML similar to that for a private bucket. However I can use this, by doing a HEAD on each file in the directory list I get either a "200 OK" or a "403 Forbidden", this means that I can now enumerate all the files to see if they are public or private.

Quick summary... Given a word list I can check which buckets exist and if they do whether they are public or private. For all public ones I can get a directory listing and from that listing I can see which files are public and which are private. I think that is pretty good for a mornings work.

I called the script Bucket Finder and you can download it from its project page.

I've ran the script a few times with some nice long word lists and got some interesting data back but as this post is getting a bit long I'll stop here and you can read the analysis in Analysing Amazon's Buckets.