Using Google Cloud Storage in your Django Project
A few tips on some simple and not-so-simple ways to use Google Cloud Storage in a Django Project
Earlier this year, we were tasked with implementing Google Cloud Storage (GCS) on a Django project. GCS would be used to store uploaded images and static assets.
We also had an additional complication: we needed two separate storage buckets, one for public assets, and one for private ones. For context, a storage bucket in GCS is just the container for the data you are saving there (read more about buckets).
Let's take a look at a way to set up GCS in a Django project, along with a way to go about implementing multiple bucket storage.
My use case #
I needed two buckets; one public, one private. The public bucket would be used for most things in the app: static images, uploaded files, etc. The private bucket would be used as a way to store files that drive some data visualizations. These files are large (1-2 GB) CSV files, not something users would need to see or download.
The way I solved this is with django-storages. This is a package meant to handle cloud storage for a variety of cloud providers, like Google, Amazon, and Microsoft. We'll be looking at some GCS-specific scenarios, but the ideas are fairly translatable between those three cloud providers.
Basic GCS Django setup #
Before tackling multiple buckets, here is how you could set up baseline GCS storage with django-storages
.
- Install the package.
pip install django-storages[google]
- Add the necessary and/or helpful settings to your settings file (YMMV with what settings you need).
from google.oauth2 import service_account
# ...
GS_BUCKET_NAME = "YOUR BUCKET NAME"
DEFAULT_FILE_STORAGE = "storages.backends.gcloud.GoogleCloudStorage"
MEDIA_URL = "URL.to.GCS"
GS_CREDENTIALS = service_account.Credentials.from_service_account_file(
"path/to/credentials.json"
)
GS_EXPIRATION = timedelta(minutes=5)
GS_BLOB_CHUNK_SIZE = 1024 * 256 * 40 # Needed for uploading large streams, entirely optional otherwise
- To break down these settings
GS_BUCKET_NAME
is the name of your primary bucket in GCS.DEFAULT_FILE_STORAGE
is the class that is used by Django when storing almost anything.MEDIA_URL
allows Django to know where to look for stored files.GS_CREDENTIALS
is a variable fordjango-storages
to allow it to access yourcredentials.json
file that you get from GCS.GS_EXPIRATION
is a time value for how long a generated URL is valid for. The default is a day; however we were tasked with shortening it to five minutes, so that URLs to uploaded PDFs or similar could not be sent around to anyone.- This setting is needed only for non-public buckets in GCS that want signed URLs. I mentioned that we had a "public" and "private" bucket, but in terms of GCS settings, neither bucket will be "public". That way, only users that are authenticated in our system will be able to generate a signed URL to see uploaded files. This contrasts our actually "private" bucket in that no non-admin user will be able to access files in that bucket.
GS_BLOB_CHUNK_SIZE
is needed when uploading large files. See the docs for both django-storages and GCS for more information on chunk size.
After these settings and the package are installed, you should be ready to use GCS.
# the following code is from the django-storages docs
>>> from django.db import models
>>> class Resume(models.Model):
... pdf = models.FileField(upload_to='pdfs')
... photos = models.ImageField(upload_to='photos') ...
>>> resume = Resume()
>>> print(resume.pdf.storage)
<storages.backends.gcloud.GoogleCloudStorage object at ...>
Back to the problem at hand: multiple buckets #
Support for multiple buckets is not something that is necessarily built into django-storages
. To give a little more context, we needed a FileField
on only one model to go to a different bucket. Every other FileField
instance should go to the default bucket.
- Add another setting to your settings file.
PRIVATE_GS_BUCKET_NAME = "other-bucket-name"
- Create a class that subclasses off of
django-storages
storage class.
from django.utils.deconstruct import deconstructible
from django.conf import settings
from storages.backends.gcloud import GoogleCloudStorage
@deconstructible
class PrivateGCSMediaStorage(GoogleCloudStorage):
def __init__(self, *args, **kwargs):
kwargs["bucket_name"] = getattr(settings, "PRIVATE_GS_BUCKET_NAME")
super(PrivateGCSMediaStorage, self).__init__(*args, **kwargs)
- Go to your model file, and add a new argument to the
FileField
constructor.
from myfile import PrivateGCSMediaStorage
class Upload(models.Model):
csv_file = models.FileField(
storage=PrivateGCSMediaStorage,
# any other settings...
)
- Then, just run the commands to make and apply migrations and you should be set!
python manage.py makemigrations
python manage.py migrate
Breaking it down #
- But hold on, what did that all accomplish? Let's break down the custom class.
@deconstructible
- This line is a decorator that adds a
deconstruct
method to the class, allowing it to be serialized and used in Django migrations. Read more about this process.
class PrivateGCSMediaStorage(GoogleCloudStorage):
- We are subclassing off of the class provided to use by
django-storages
.
def __init__(self, *args, **kwargs):
kwargs["bucket_name"] = getattr(settings, "PRIVATE_GS_BUCKET_NAME")
super(PrivateGCSMediaStorage, self).__init__(*args, **kwargs)
This is overriding
GoogleCloudStorage
's initialization method. All we are doing is providing it a custom "bucket_name" attribute, which we are setting as our private bucket's name. That way, when a file is stored through this storage class, it will be stored to that separate bucket.- Note that
GoogleCloudStorage
gets its bucket name from the requireddjango-storages
settings variable,GS_BUCKET_NAME
.
- Note that
Then, we can pass the class reference itself to the
storage
argument on theFileField
. The docs on that tell us that we can use a storage object, or a callable which returns a storage object. This will be very useful if, locally, you don't want files to be stored in two separate buckets. You'd rather them be stored the default way, all in one bucket.
Here is how you'd go about that #
- Set a flag in your settings
USE_PRIVATE_STORAGE = False # will be set to True in production or wherever
- Then, add this function under your
PrivateGCSMediaStorage
class (or wherever you want).
from django.core.files.storage import default_storage
def select_storage():
return PrivateGCSMediaStorage() if settings.USE_PRIVATE_STORAGE else default_storage
- This way, the model
Upload
will usedefault_storage
if that setting is set toFalse
.default_storage
is whatFileField
defaults to when you do not provide astorage
keyword argument, so all models will use the same storage type.
- Change your
FileField
like so
class Upload(models.Model):
csv_file = models.FileField(
storage=select_storage,
# any other settings...
)
- Rerun migrations just like above and you will see something similar to this in the generated migration.
operations = [
migrations.AlterField(
model_name='upload',
name='csv_file',
field=models.FileField(storage=your.project.path.select_storage),
),
]
- This therefore allows all files uploaded to the
Upload
model to be stored in our secondary GCS bucket, while every other file field will go to the default GCS bucket.
That's it! #
This could naturally be extended to allow for any number of additional buckets. The flexibility of allowing for multiple buckets with differing levels of security could be incredibly helpful with hiding certain information away from the users on the cloud-storage level. Additionally, a similar version of this is possible with the AWS and Microsoft Azure implementations of django-storages
, where you can have multiple S3 buckets or Azure Blobs with similar security constaints. Best of luck with your Django projects!