{"id":2278,"date":"2012-04-03T13:37:49","date_gmt":"2012-04-03T19:37:49","guid":{"rendered":"http:\/\/bililite.nfshost.com\/blog\/?p=2278"},"modified":"2012-04-03T15:11:14","modified_gmt":"2012-04-03T21:11:14","slug":"using-s3-files-in-php","status":"publish","type":"post","link":"https:\/\/bililite.com\/blog\/2012\/04\/03\/using-s3-files-in-php\/","title":{"rendered":"Using S3 files in PHP"},"content":{"rendered":"<p><a href=\"\/blog\/2012\/04\/02\/using-nfs-net-with-amazon-s3\/\">As I wrote<\/a>, I'm using <a href=\"http:\/\/aws.amazon.com\/s3\/\">Amazon S3<\/a> to store files that are too expensive to keep on my web server, with the plan of having frequently-updated files on the server and relatively constant stuff on S3. The address for my S3 server is bililite.s3.amazonaws.com, which is stored in the global variable <code class=\"language-php\">$_SERVER['CDN']<\/code>.<\/p>\r\n<p>So to <a href=\"http:\/\/php.net\/include\">include<\/a> a file, I would do:<\/p>\r\n<pre><code class=\"language-php\">$filename = '\/toinclude.php';\r\nif (file_exists($_SERVER['DOCUMENT_ROOT'].$filename)){\r\n  $filename = $_SERVER['DOCUMENT_ROOT'].$filename;\r\n}else{\r\n  $filename = $_SERVER['CDN'].$filename;\r\n}\r\ninclude ($filename);<\/code><\/pre>\r\n<p>Which I use often enough to want to generalize it into a class.<\/p>\r\n<!--more-->\r\n<p>The other thing that would be useful is a directory listing, which is harder than it sounds for S3 since it has no directory structure; it's just a flat database of keys (the equivalent of filenames) and values (the files themselves). Thus <code>bililite.s3.amazonaws.com\/images\/silk\/add.png<\/code> has S3 return the file labelled <code>\/images\/silk\/add.png<\/code>; it has no inherent relationship to <code>\/images\/silk\/delete.png<\/code> or <code>\/images\/silk\/<\/code>.<\/p>\r\n<p>The key is that just retrieving the server URL returns an XML listing of all the files, and there is an API to limit the files returned. <a href=\"http:\/\/bililite.s3.amazonaws.com\">bililite.s3.amazonaws.com<\/a> returns all the files (up to a numerical limit; see below). <a href=\"http:\/\/bililite.s3.amazonaws.com?prefix=images\/silk\/\">bililite.s3.amazonaws.com?prefix=images\/silk\/<\/a> returns all the filenames that start with <code>images\/silk\/<\/code> (note no leading slash). That's not quite enough, since it gives us sub-folders as well, but the <code>delimiter<\/code> parameter tells S3 to group all files that contain the delimiter after the prefix into one entry in the XML list. That's the equivalent of a subfolder. So <a href=\"http:\/\/bililite.s3.amazonaws.com?prefix=images\/silk\/&delimiter=\/\">bililite.s3.amazonaws.com?prefix=images\/silk\/&amp;delimiter=\/<\/a> gives us the list we want.<\/p>\r\n<p>One more subtlety: S3 returns a maximum of 1000 names, then sets a flag in the XML to say the list was truncated. You can then ask for the next 1000 by naming the last returned file.<\/p>\r\n<p>The <a href=\"http:\/\/docs.amazonwebservices.com\/AmazonS3\/latest\/API\/RESTBucketGET.html\">documentation<\/a> is pretty opaque, but it's all in there.<\/p>\r\n<p><a href=\"\/blog\/blogfiles\/highlight.php?source=s3.class.php\">I put it all together<\/a> into an abstract class that just handles the file part (assuming that any CDN would work the same way, just appending the host name to the file name) and a concrete class that handles the S3-specific directory-simulating parts. <a href=\"\/blog\/blogfiles\/highlight.php?source=s3.class.php\">See the source code<\/a>. The method names are meant to parallel the built-in PHP functions.<\/p>\r\n<div class=\"prelike\"><dl>\r\n<dt>constructor<\/dt>\r\n<dd><code class=\"language-php\">$s3 = new S3('http:\/\/bililite.s3.amazonaws.com');<\/code><\/dd>\r\n<dt>realpath<\/dt>\r\n<dd><pre><code class=\"language-php\">$path = $s3->realpath('\/toinclude.php');\r\ninclude($path);\r\n\/\/ or\r\n$content = file_get_contents($path);<\/code><\/pre> returns the real path for the file, either from <code class=\"language-php\">$_SERVER['DIRECTORY_ROOT']<\/code> or the S3 root passed in with the constructor. In other words, if the file exists on the web server, <code>realpath<\/code> returns something like '\/public\/www\/toinclude.php' and if it does note, returns something like 'http:\/\/bililite.s3.amazonaws.com\/public\/www\/toinclude.php'. Note that if the file does not exist on the web server, this will return the path on the S3 root without checking if the file actually exists; use <code>file_exists<\/code> for that.<\/dd>\r\n<dt>isCDN<\/dt>\r\n<dd><code class=\"language-php\">$flag = $s3->isCDN($s3->realpath('\/toinclude.php'));<\/code> returns <code class=\"language-php\">FALSE<\/code> if the path represents a file on the web server (i.e. from <code class=\"language-php\">$_SERVER['DIRECTORY_ROOT']<\/code>), <code class=\"language-php\">TRUE<\/code> otherwise (note that it does not check if the file actually exists on the S3 server). Note also that this requires the path returned by <code>realpath<\/code>.<\/dd>\r\n<dt>file_exists<\/dt>\r\n<dd><code class=\"language-php\">$flag= $s3->file_exists('\/toinclude.php');<\/code> returns <code class=\"language-php\">TRUE<\/code> if the file exists on the web server or the S3 server (this does check the S3 server).<\/dd>\r\n<dt>filemtime<\/dt>\r\n<dd><code class=\"language-php\">$timestamp = $s3->filemtime('\/toinclude.php');<\/code> returns the time the file was last modified.<\/dd>\r\n<dt>scandir<\/dt>\r\n<dd><code class=\"language-php\">$files = $s3->scandir('\/images\/');<\/code> returns an array of names of files that exist <em>either<\/em> on the web server or the S3 server (it's the union of the directory contents).<\/dd>\r\n<\/dl><\/div>\r\n<p>This assumes that the ACL (access control list) for the files is set to allow anonymous reading; if not, use Donovan Sch\u00f6nknecht's excellent <a href=\"http:\/\/undesigned.org.za\/2007\/10\/22\/amazon-s3-php-class\">S3 class<\/a>. Of course, you'd have to rename one of these classes to avoid the conflict (or use <a href=\"http:\/\/php.net\/namespace\">namespaces<\/a>).<\/p>\r\n<p>Hope this helps someone.<\/p>","protected":false},"excerpt":{"rendered":"As I wrote, I'm using Amazon S3 to store files that are too expensive to keep on my web server, with the plan of having frequently-updated files on the server and relatively constant stuff on S3. The address for my S3 server is bililite.s3.amazonaws.com, which is stored in the global variable $_SERVER['CDN']. So to include [&hellip;]","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,20],"tags":[],"_links":{"self":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/2278"}],"collection":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/comments?post=2278"}],"version-history":[{"count":23,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/2278\/revisions"}],"predecessor-version":[{"id":2301,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/2278\/revisions\/2301"}],"wp:attachment":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/media?parent=2278"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/categories?post=2278"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/tags?post=2278"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}