Streamline Your Workflow with the GA API
Mitch Daniels, Former Senior Digital Analyst
Article Category:
Posted on
While Google Analytics (GA) is our go-to analytics tool across almost every project, sometimes pulling many pieces of data from within the interface grows tedious and time consuming. In these instances, GA’s Core Reporting API can save us a lot of time and frustration. This post will show you how we’ve modified a third-party script to make API pulls reliable and painless.
How it works
There are a number of ways to access the GA API, whether through Java, PHP, Python or Javascript. A while back some of our developers even whipped up a Ruby wrapper for the GA API. While each of these methods has its place, our most-used tool is a Google Spreadsheets plugin by Mikael Thuneberg called Data Fetch Functions. The script provides a means of accessing granular bits of data in a straightforward and flexible format.
To get started with the script, create a new Google Spreadsheet and open Tools > Script Gallery. Search for ‘Google Analytics Data Fetch Functions’ and install the version created by Mikael Thuneberg. Once that script is installed, open it up with Tools > Script Editor to get a better understanding of what it does and how it works. This German blog post provides a great summary of how to get up and running (Google Translate FTW!).
Once you’ve generated an authentication token and grabbed your GA profile ID, get ready to put the ‘fun’ back in function (I’m truly sorry). To start out, we can use the getGAData() function to find the number of site visits for a specific time period as follows:
=getGAdata({authToken}, {profileNumber}, “ga:visits”, 9/1/2013, 9/30/2013)
Once you have that function working, feel free to get a bit more fancy by adding in advanced segments, filters and sorting to capture unique snapshots of your data. As a quick reference, the order of parameters is as follows:
- token
- profile number
- metrics
- start date
- end date
- filters
- segments
- sort (TRUE or FALSE)
- include headers (TRUE or FALSE)
- max rows
- start from row
Google’s documentation provides details about specific formatting requirements:
The Fun Stuff
The real value of the API is that it allows us to pull otherwise disparate bits of data into a single view, to quickly update date ranges for recurring reports, and to then perform additional calculations on the fly. The flexibility of the GA Data Fetch Functions within Google Spreadsheets presents opportunities to create highly customized dashboards or hand-selected groups of KPIs.
In addition, the API can save a ton of time when we want to collect many bits of data that would prove tedious within the GA interface. For example, imagine we have a list of page paths for our website and we want to see, for each page, the number of visits that viewed that particular page during their visit. Pulling each of these numbers from within the GA interface would be tedious and time consuming, requiring a unique Advanced Segment for each calculation. With the GA Data Fetch Functions, we can simply create a function to dynamically calculate each of these requests and we can pull the data almost instantaneously.
This example spreadsheet gives an idea of how we might approach this problem. Once the formulas are configured with the correct absolute and relative references, it’s easy to create a list of hundreds of unique segments and to update the data on the fly!
Except...
...there is one big ‘but’. To protect itself from receiving more data than it can handle, Google places limits on the maximum number of API requests from any given account.
Several quotas apply:
- 50,000 requests per project per day
- 10,000 requests per profile per day
- 10 queries per second (QPS) per IP
- 10 concurrent requests per profile
We’ve never run up against the first two quotas, but numbers three and four can prove challenging when we’re creating workbooks with lists of unique queries. Fortunately, a clever workaround allows us to temper the rate at which we submit queries so that we avoid a nasty error message.
Cascade your queries
When pulling a long list of queries, the easiest way to avoid the quota limit is to hold the execution of each query until the previous query has completed. We use the following formula:
=if(or(isnumber(Previous Query Result),Previous Query Result = “No data found”),getGAData(...),””)
Simply wrap each getGAData() function in the list (except the first one) in this if-statement to cascade execution.
10 queries per second
Cascading queries takes care of the concurrent request limit, but sometimes the API actually pulls data so quickly that we plow through more than 10 queries within 10 seconds. To avoid this problem, we can add a sleep timer to our Data Fetch Functions script. Simply add the following snippet immediately before your getGAdata() function is declared:
Utilities.sleep(1001);
This little guy makes each of our queries wait for 1.001 seconds and ensures we don’t bump up against the 10 QPS limit. While it slows our process a bit, it’s a necessary evil to avoid quota errors.
Caveats and Final Thoughts
Even with all its utility, this toolset carries with it a few important caveats:
- Data sampling is not apparent. GA samples data for queries on large sets of data or for very specific subsets of data (more details here). The GA interface provides a notification that data is sampled, but no such notification exists from within the Data Fetch Functions. Be aware of this fact and be sure to duplicate a few of your queries from within the interface to understand whether or not your data is sampled.
- Data formats aren’t pretty. Each time queries are re-executed, new values will ignore previously set formatting within Google Spreadsheets. In cases where we want a pretty final product, we pull our data into one sheet, usually called ‘Raw’, and then do our nice formatting in a totally separate sheet by referencing values across sheets.
- Durability isn’t guaranteed. While the Data Fetch Functions have proved robust enough for most of our applications thus far, it’s important to remember that they weren’t officially developed by Google, so support for future changes and updates may be slow or limited.
Huge kudos to Mikael Thuneberg for developing the original Data Fetch Functions. We’ve found them hugely useful and continue to use them on a near-daily basis. In addition the Data Fetch Functions available for Google Spreadsheets, he’s created an even more robust Supermetrics Data Grabber for Excel that provides a number of additional features.
Let us know how you fare with the Data Fetch Functions, particularly when pulling large data sets. Have you found other ways to quickly pull large numbers of unique queries? We’re always on the lookout for ways to improve the way we work!