Your friends at Viget present Extend, a Code & Technology Blog

Make Remote Files Local with Ruby Tempfile

We live in the age of remote resources. It's pretty rare to store uploaded files on the same machine as your server process. File storage these days is almost completely remote, and for very good reasons.

Using file storage services like S3 is awesome, but not having your files accessible locally can complicate the performance of file-oriented operations. In these cases, use Ruby's Tempfile to create local files that live just long enough to satisfy your processing needs.

Anatomy of Tempfile#new

# What you want the tempfile filename to start with
name_start = 'my_special_file'

# What you want the tempfile filename to end with (probably including an extension)
# by default, tempfilenames carry no extension
name_end = '.gif'

# Where you want the tempfile to live
location = '/path/to/some/dir'

# Options for the tempfile (e.g. which encoding to use)
options = { encoding: Encoding::UTF_8 }

# Will create a tempfile
# at /path/to/some/dir/my_special_file_20140224-1234-abcd123.gif
# (where '20140224-1234-abcd123' represents some unique timestamp & token)
# with a UTF-8 encoding
# with the contents 'Hello, tempfile!'
Tempfile.new([name_start, name_end], location, options) do |file|
  file.write('Hello, tempfile!')
end

Example Application

URL to Tempfile: Remote File Processing

We have a service that takes a URL and processes the file it represents using a Java command-line utility. Our command-line utility expects a filepath argument, so we must create a local file from the remote resource before processing.

class LocalResource
  attr_reader :uri

  def initialize(uri)
    @uri = uri
  end

  def file
    @file ||= Tempfile.new(tmp_filename, tmp_folder, encoding: encoding).tap do |f|
      io.rewind
      f.write(io.read)
      f.close
    end
  end

  def io
    @io ||= uri.open
  end

  def encoding
    io.rewind
    io.read.encoding
  end

  def tmp_filename
    [
      Pathname.new(uri.path).basename,
      Pathname.new(uri.path).extname
    ]
  end

  def tmp_folder
    # If we're using Rails:
    Rails.root.join('tmp')
    # Otherwise:
    # '/wherever/you/want'
  end
end

def local_resource_from_url(url)
  LocalResource.new(URI.parse(url))
end

# URL is provided as input
url = 'https://s3.amazonaws.com/your-bucket/file.gif'

begin
  # We create a local representation of the remote resource
  local_resource = local_resource_from_url(url)

  # We have a copy of the remote file for processing
  local_copy_of_remote_file = local_resource.file

  # Do your processing with the local file
  `some-command-line-utility #{local_copy_of_remote_file.path}`
ensure
  # It's good idea to explicitly close your tempfiles
  local_copy_of_remote_file.close
  local_copy_of_remote_file.unlink
end

Tempfiles vs Files

Ruby Tempfile objects act almost identically to regular File objects, but have a couple of advantages for transient processing or uploading tasks:

  • Tempfiles' filenames are unique, so you can put them in a shared tmp directory without worrying about name collision.
  • Tempfiles' files are deleted when the Tempfile object is garbage collected. This prevents a bunch of extra files from accidentally accumulating on your machine. (But you should of course still explicity close Tempfiles after working with them.)

Common Snags

Rewind

Certain IO operations (like reading contents to determine an encoding) move the file pointer away from the start of the IO object. In these cases, you will run into trouble when you attempt to perform subsequent operations (like reading the contents to write to a tempfile). Move the pointer back to the beginning of the IO object using #rewind.

io_object = StringIO.new("I'm an IO!")
encoding = io_object.read.encoding

# The pointer is now at the end of 'io_object'.
# When we read it again, the return is an empty string.
io_object.read
# => ""

# But if we rewind first, we can then read the contents.
io_object.rewind
io_object.read
# => "I'm an IO!"

Encoding

Often you'll need to ensure the proper encoding of your tempfiles. You can provide your desired encoding during Tempfile initialization as demonstrated below.

encoding = Encoding::UTF_8

Tempfile.new('some-filename', '/some/tmp/dir', encoding: encoding).tap do |file|
  # Your code here...
end

Obviously your desired encoding won't always be the same for every file. You can find your desired encoding on the fly by sending #encoding to your file contents string. Or if you're using an IO-object, you can call io.object.read.encoding.

encoding = file_contents_string.encoding
# or
# encoding = io_object.read.encoding

Tempfile.new('some-filename', '/some/tmp/dir', encoding: encoding).tap do |file|
  # Your code here...
end

Read more about Ruby encoding.

Extensions

By default, files created with Tempfile.new will not carry an extension. This can pose problems for applications or tools (like Carrierwave and soffice) that rely on a file's extension to perform their operations.

In these cases, you can pass an extension to the Tempfile initialization as demonstrated above in Anatomy of Tempfile#new.

# A quick refresher
Tempfile.new(['file_name_prefix', '.extension'], '/tmp')

If you need to dynamically determine your file's extension, you can usually grab it from the URL or file path you are reading into your Tempfile:

uri = URI.parse('https://example.com/some/path/to/file.gif')
path = '/some/path/to/file.gif'

Pathname.new(uri.path).extname
# => '.gif'

Pathname.new(path).extname
# => '.gif'

Local Development (Paths vs URLs)

Many developers use local file storage for their development environment. In these cases, local file paths often appear in methods that are expecting URLs. Not fun.

OpenURI to the Rescue

If you need to write code that supports reading files from both file paths and URLs, OpenURI is your saviour.

OpenURI is accessible via the Kernel function open, which provides a file-like API for both local and remote resources.

open('/path/to/your/local/file.gif') do |file|
  # Your code here...
end
open('https://s3.amazonaws.com/your-bucket/file.gif') do |file|
  # Your code here...
end

We like Ruby Tempfiles for performing file-oriented operations on remote resources. What do you use?

Thanks to Ryan Foster for his contributions to the sample code.


Get More From Viget

Subscribe to get our monthly newsletter and occasional special announcements.