Make Remote Files Local with Ruby Tempfile
Lawson Kurtz, Former Senior Developer
Article Category:
Posted on
We live in the age of remote resources. It's pretty rare to store uploaded files on the same machine as your server process. File storage these days is almost completely remote, and for very good reasons.
Using file storage services like S3 is awesome, but not having your files accessible locally can complicate the performance of file-oriented operations. In these cases, use Ruby's Tempfile to create local files that live just long enough to satisfy your processing needs.
Anatomy of Tempfile#new
# What you want the tempfile filename to start with
name_start = 'my_special_file'
# What you want the tempfile filename to end with (probably including an extension)
# by default, tempfilenames carry no extension
name_end = '.gif'
# Where you want the tempfile to live
location = '/path/to/some/dir'
# Options for the tempfile (e.g. which encoding to use)
options = { encoding: Encoding::UTF_8 }
# Will create a tempfile
# at /path/to/some/dir/my_special_file_20140224-1234-abcd123.gif
# (where '20140224-1234-abcd123' represents some unique timestamp & token)
# with a UTF-8 encoding
# with the contents 'Hello, tempfile!'
Tempfile.new([name_start, name_end], location, options) do |file|
file.write('Hello, tempfile!')
end
Example Application
URL to Tempfile: Remote File Processing
We have a service that takes a URL and processes the file it represents using a Java command-line utility. Our command-line utility expects a filepath argument, so we must create a local file from the remote resource before processing.
class LocalResource
attr_reader :uri
def initialize(uri)
@uri = uri
end
def file
@file ||= Tempfile.new(tmp_filename, tmp_folder, encoding: encoding).tap do |f|
io.rewind
f.write(io.read)
f.close
end
end
def io
@io ||= uri.open
end
def encoding
io.rewind
io.read.encoding
end
def tmp_filename
[
Pathname.new(uri.path).basename,
Pathname.new(uri.path).extname
]
end
def tmp_folder
# If we're using Rails:
Rails.root.join('tmp')
# Otherwise:
# '/wherever/you/want'
end
end
def local_resource_from_url(url)
LocalResource.new(URI.parse(url))
end
# URL is provided as input
url = 'https://s3.amazonaws.com/your-bucket/file.gif'
begin
# We create a local representation of the remote resource
local_resource = local_resource_from_url(url)
# We have a copy of the remote file for processing
local_copy_of_remote_file = local_resource.file
# Do your processing with the local file
`some-command-line-utility #{local_copy_of_remote_file.path}`
ensure
# It's good idea to explicitly close your tempfiles
local_copy_of_remote_file.close
local_copy_of_remote_file.unlink
end
Tempfiles vs Files
Ruby Tempfile objects act almost identically to regular File objects, but have a couple of advantages for transient processing or uploading tasks:
- Tempfiles' filenames are unique, so you can put them in a shared tmp directory without worrying about name collision.
- Tempfiles' files are deleted when the Tempfile object is garbage collected. This prevents a bunch of extra files from accidentally accumulating on your machine. (But you should of course still explicity close Tempfiles after working with them.)
Common Snags
Rewind
Certain IO operations (like reading contents to determine an encoding) move the file pointer away from the start of the IO object. In these cases, you will run into trouble when you attempt to perform subsequent operations (like reading the contents to write to a tempfile). Move the pointer back to the beginning of the IO object using #rewind
.
io_object = StringIO.new("I'm an IO!")
encoding = io_object.read.encoding
# The pointer is now at the end of 'io_object'.
# When we read it again, the return is an empty string.
io_object.read
# => ""
# But if we rewind first, we can then read the contents.
io_object.rewind
io_object.read
# => "I'm an IO!"
Encoding
Often you'll need to ensure the proper encoding of your tempfiles. You can provide your desired encoding during Tempfile initialization as demonstrated below.
encoding = Encoding::UTF_8
Tempfile.new('some-filename', '/some/tmp/dir', encoding: encoding).tap do |file|
# Your code here...
end
Obviously your desired encoding won't always be the same for every file. You can find your desired encoding on the fly by sending #encoding
to your file contents string. Or if you're using an IO-object, you can call io.object.read.encoding
.
encoding = file_contents_string.encoding
# or
# encoding = io_object.read.encoding
Tempfile.new('some-filename', '/some/tmp/dir', encoding: encoding).tap do |file|
# Your code here...
end
Read more about Ruby encoding.
Extensions
By default, files created with Tempfile.new
will not carry an extension. This can pose problems for applications or tools (like Carrierwave and soffice) that rely on a file's extension to perform their operations.
In these cases, you can pass an extension to the Tempfile initialization as demonstrated above in Anatomy of Tempfile#new.
# A quick refresher
Tempfile.new(['file_name_prefix', '.extension'], '/tmp')
If you need to dynamically determine your file's extension, you can usually grab it from the URL or file path you are reading into your Tempfile:
uri = URI.parse('https://example.com/some/path/to/file.gif')
path = '/some/path/to/file.gif'
Pathname.new(uri.path).extname
# => '.gif'
Pathname.new(path).extname
# => '.gif'
Local Development (Paths vs URLs)
Many developers use local file storage for their development environment. In these cases, local file paths often appear in methods that are expecting URLs. Not fun.
OpenURI to the Rescue
If you need to write code that supports reading files from both file paths and URLs, OpenURI is your saviour.
OpenURI is accessible via the Kernel function open
, which provides a file-like API for both local and remote resources.
open('/path/to/your/local/file.gif') do |file|
# Your code here...
end
open('https://s3.amazonaws.com/your-bucket/file.gif') do |file|
# Your code here...
end
We like Ruby Tempfiles for performing file-oriented operations on remote resources. What do you use?
Thanks to Ryan Foster for his contributions to the sample code.