Removing Referral Spam from Google Analytics

Ben Travis, Former Marketing Manager

Article Categories: #Strategy, #Data & Analytics

Posted on

Step-by-step instructions for removing referral spam from your Google Analytics data.

Updates

Although there are rumblings of an official solution from Google, referral spam continues to pop up. This update targets new spam referrers including rank-checker, keywords-monitoring-success, free-video-tool, unpredictable, and teedle. I highly recommend implementing the hostname filter before resorting to other solutions, if possible.

You know what really grinds my gears? Opening up a report in Google Analytics and having to deal with referral spam. In this post, I’ll tell you how to deal with referral spam and why it’s dangerous.

Click here to go straight to the solutions...

In the past year, I’ve noticed an alarming trend of referral spam creeping into my Google Analytics reports. Referral spam is the practice of sending bogus referral traffic to a website or product. It may sound relatively harmless, but referral spam is quickly turning into a serious issue.

Types of Referral Spam

In the context of Google Analytics, referral spam comes in two main flavors: spammy web crawlers and ghost referral traffic.

Web crawlers are robots that visit websites, usually with the intention of indexing content. Most web crawlers identify themselves as such to web servers and are then left out of analytics reports. However, some web crawlers like those from Semalt (boo!) don’t identify themselves as robots and end up showing up in analytics reports as sessions with a 100% bounce rate and 0 second duration. Google recently introduced a feature to filter out known bots and spiders, though it’s definitely not perfect (more on that later).

Ghost referral traffic, arguably the greater of the two referral spam evils, never actually visits a website. In these cases, spammers exploit the fact that Google Analytics now transfers information via HTTP requests directly to Google Analytics servers, meaning someone can “spoof” a session very easily. Ghost referral traffic can be generated by a simple program that sends fake HTTP requests aimed at different Google Analytics properties, so this traffic doesn’t even hit your site. Even more annoying is the fact that this type of spam can be used to spoof organic search results and send false events, as well. See the screenshot below for an example:

Google Analytics Spam Traffic Report

Note: For ghost referral traffic, modifying .htaccess won’t help at all since these spammers never actually visit your site -- for more information view Google's Measurement Protocol documentation.

Negative Implications 

“A referrer is a simple HTTP header that's passed along when a browser goes from one page to another page, normally used to indicate where a user's coming from. But users can change it, and some people will set referrer at pages they want to promote and visit tons of people around the web -- people see it and say 'Oh, I should check it out'. It's not necessarily a link… there are some people who try to drive traffic by visiting a ton of websites with an automated script and setting the referrer to be the URL they want to promote... there's no 'authentication'… You can’t automatically assume that it was the owner of the URL if you see something showing up in your dashboard. Somebody is trying to do some hijinx.”
- Matt Cutts, Head of Google Webspam Team

 So, why is referral spam so bad? For one, it’s screwing up my web analytics data. “Sessions” entering via referral spam skew the data, clouding the accuracy of engagement metrics and inflating traffic volume metrics. Unfortunately, those unaware of spam issues may base decisions based on inaccurate data, especially for sites with low traffic.

Moreover, referral spam makes SEO more difficult for everyone. One aim of referral spam is to have links from sites that publish their access logs. Some websites publish web analytics data publicly, which can include hyperlinks back to the spammer’s designated URL. These backlinks can improve search engine results for that URL since many websites publishing referrer data are presumably trustworthy.

There are also more nefarious opportunities available to referral spammers. If a spammer wanted to send a website unwanted and unqualified traffic, they could simply change the name of the referral URL to the victim’s URL. As mentioned in the above quote from Matt Cutts, referral spam can’t truly be “authenticated” and tracked back to a specific source. With this in mind, referral spam could be used to harm reputations, possibly framing an innocuous website as a spam referrer.

Exposure to malware is another potential threat to anyone curious enough to visit referral spam addresses. With the rise of electronic data theft, it would be simple for referrer spam networks to point to URLs containing malicious software aimed at stealing valuable information.

Finally, no one wants to be advertised to while looking at web analytics acquisition reports.

Solutions

Within Google Analytics, there are multiple options to remove referral spam:

Exclude Foreign Hostnames and Filter Spammy Crawlers

One defining attribute of many ghost referrals is an inaccurate hostname attribution. When reviewing referral data in Google Analytics, the hostname will be completely unrelated to your website (e.g., “apple.com”). With this knowledge, it’s relatively simple to create a filter to only include data with an accurate hostname. For Google Analytics users with only one or a handful of domains, this solution may be the simplest (check here for a quick refresher on regular expressions in GA):

Google Analytics Hostname Filter
In most cases, substituting your top domain name for example.com will be sufficient. For multiple domains, check your regular expressions with Regex Pal. This filter will also address the recent uptick in direct traffic with a hostname of "(not set)".

That first filter will remove any ghost referral traffic. However, an additional filter will also be required to remove spammy web crawlers (like Semalt) since they actually visit the site and will report an accurate hostname. A solution to remove the two most popular web crawler offenders can be seen below using an Exclude Campaign Source filter:

Google Analytics Filter for Spammy Webcrawlers

Featured Regular Expression:

.*(semalt(media)?|buttons\-for\-website)\.com.*

Note: You should always retain an unfiltered view, as data processed by GA filters cannot be reverted.

Filter All Referral Spam Sources

In cases where domains in a measured view can easily change, blocking referral spam may require a more exhaustive referral filter encompassing all offending referral sites. Over the past few months, I’ve created a list of offending sites and updated the filter accordingly, as seen below. As a quick caveat, while this list targets many of the offending referral spam sources, it’s by no means an exhaustive list.

With the discovery of more spam referrals, I've updated the regular expressions below the image, and this solution will now require two Exclude Campaign Source filters.

In prior versions of this blog post, an Exclude Referral filter was recommended, but it has since been updated to reflect a more appropriate filter, an Exclude Campaign Source filter. S/o to Jordan Strauss for pointing out the issue.

Google Analytics Referral Spam Filter

Featured Regular Expressions:

.*((darodar|priceg|buttons\-for(\-your)?\-website|makemoneyonline|blackhatworth|hulfingtonpost|o\-o\-6\-o\-o|(social|(simple|free|floating)\-share)\-buttons)\.com|econom\.co|ilovevitaly(\.co(m)?)|(ilovevitaly(\.ru))|(humanorightswatch|guardlink)\.org).*

Update #1 - I've added another regular expression since the first one has reached the 255 character limit.

.*((best(websitesawards|\-seo\-(solution|offer))|get\-free(\-social)?\-traffic(\-now)?|googlsucks)\.com|(domination|torture)\.ml|((rapidgator\-)?(general)?porn(hub(\-)?forum)?|4webmasters)\.(ga|tk|org|uni)|(buy\-cheap\-online)\.info).*

Update #2 - Yet another regular expression to include.

.*((event\-tracking|semalt(media)?|(100dollars|success)\-seo|chinese\-amezon|e\-buyeasy|rankings\-analytics|rednise|video\-\-production|theguardlan|webmaster\-traffic)\.com|traffic(monetize(r)?|2money)\.(org|com)|pops\.foundation|erot\.co).*

Update #3 - Getting pretty tired of having to add new regular expressions.

.*(((free\-)?(floating|get\-your\-social)\-(share\-)?buttons|hosting\-tracker|alibestsale)\.(com|info)|(justprofit|best\-seo\-software)\.xyz|snip\.to|adf\.ly|copyrightclaims\.org|(black\-friday|cyber\-monday)\.ga).*

Update #4 - More regular expressions.

.*((monitoring(-your)?-success|uptime|free-video-tool|hdmoviecams)\.com|(monetizationking|popads)\.net|rank-checker\.online|(marketland|dominateforex)\.ml|(ownshop|topquality|easycommerce)\.cf|increasewwwtraffic\.info|(unpredictable|getlamborghini)\.ga).*

Update #5 - Additional .xyz & .co spam.

.*((eu-cookie-law-enforcement|social-traffic).*\.xyz|teedle\.co).*

Advanced Segments for Historical Data

Since filters only process data moving forward, use advanced segments to review historical data from before filters were implemented. Similar to the above solutions, decide which approach is most appropriate for your site and use regular expressions to remove sessions from referral spam, as seen below:

Google Analytics Advanced Segment
Featured Regular Expressions:

.*((darodar|priceg|buttons\-for(\-your)?\-website|makemoneyonline|blackhatworth|hulfingtonpost|o\-o\-6\-o\-o|(social|(simple|free|floating)\-share)\-buttons)\.com|econom\.co|ilovevitaly(\.co(m)?)|(ilovevitaly(\.ru))|(humanorightswatch|guardlink)\.org).*

Update #1 - I've added another regular expression since the first one has reached the 255 character limit.

.*((best(websitesawards|\-seo\-(solution|offer))|get\-free(\-social)?\-traffic(\-now)?|googlsucks)\.com|(domination|torture)\.ml|((rapidgator\-)?(general)?porn(hub(\-)?forum)?|4webmasters)\.(ga|tk|org|uni)|(buy\-cheap\-online)\.info).*

Update #2 - Yet another regular expression to include.

.*((event\-tracking|semalt(media)?|(100dollars|success)\-seo|chinese\-amezon|e\-buyeasy|rankings\-analytics|rednise|video\-\-production|theguardlan|webmaster\-traffic)\.com|traffic(monetize(r)?|2money)\.(org|com)|pops\.foundation|erot\.co).*

Update #3 - Getting pretty tired of having to add new regular expressions.

.*(((free\-)?(floating|get\-your\-social)\-(share\-)?buttons|hosting\-tracker|alibestsale)\.(com|info)|(justprofit|best\-seo\-software)\.xyz|snip\.to|adf\.ly|copyrightclaims\.org|(black\-friday|cyber\-monday)\.ga).*

Update #4 - More regular expressions.

.*((monitoring(-your)?-success|uptime|free-video-tool|hdmoviecams)\.com|(monetizationking|popads)\.net|rank-checker\.online|(marketland|dominateforex)\.ml|(ownshop|topquality|easycommerce)\.cf|increasewwwtraffic\.info|(unpredictable|getlamborghini)\.ga).*

Update #5 - Additional .xyz & .co spam.

.*((eu-cookie-law-enforcement|social-traffic).*\.xyz|teedle\.co).*

Note: Advanced Segments can be applied retroactively to historical data, while Filters only process data moving forward. If unfamiliar with segments and filters, a quick comparison summary between the two can be found here.

Bot Filtering within View Settings

In July 2014, Google introduced bot and spider filtering to give users more accurate data. From the admin view interface, you can select this option, as seen below. This will exclude any sessions named in the IAB known bots and spiders list (at no extra cost for you).

In theory, this is great! However, this feature is still new, and we’re still seeing referral spam from some web crawlers make its way through the bot and spider filtering. That said, there's no harm in checking the box, especially if Google decides to introduce more functionality to this feature.

As an update to this segment, I've tested multiple views over the past month and have seen no discernable difference between views with Bot Filtering and those without Bot Filtering.

Google Analytics Bots and Spiders
For those familiar with Google Tag Manager, I'd highly recommend reading Sayf Sharif's post Eliminating Dumb Ghost Referral Traffic in Google Analytics.

Another ingenious GTM solution has been created by Simo Ahava: Spam Filter Insertion Tool.

List of Offending Sites/Wall of Shame

The current list of offenders includes:

.com

semaltbuttons-for-websitebuttons-for-your-websitedarodar
pricegmakemoneyonlineblackhatworthhulfingtonpost
bestwebsitesawardso-o-6-o-oilovevitalysimple-share-buttons
free-share-buttonsfree-share-buttonssocial-buttonsbest-seo-solution
best-seo-offerGet-Free-Traffic-Nowgooglsuckstheguardlan
webmaster-trafficevent-tracking100dollars-seosemaltmedia
traffic2moneychinese-amezonsuccess-seoget-free-social-traffic
video--productionrankings-analyticsfree-floating-buttonsrednise
hosting-trackeralibestsalekeywords-monitoring-your-successuptime
free-video-toolkeywords-monitoring-successhdmoviecamsfree-video-tool

.co

economilovevitalyerotteedle

.ru

ilovevitaly

.org

humanorightswatch4webmastersgeneralpornguardlink
trafficmonetizetrafficmonetizercopyrightclaims

.info

buy-cheap-onlineget-your-social-buttonsincreasewwwtraffic

.ml

dominationtorturemarketlanddominateforex

.ga

pornhub-forumyouporn-forumrapidgator-porndepositfiles-porn
black-fridaycyber-mondayunpredictablegetlamborghini

.tk

pornhubforum

.uni.me

pornhubforum

.foundation

pops

.xyz

justprofitbest-seo-softwareeu-cookie-law-enforcement2social-traffic-3

.snip

snip

.ly

adf

.online

rank-checker

.net

monetizationkingpopads

.cf

ownshoptopqualityeasycommerce

This Isn’t a Long-Term Solution

Unfortunately, the solutions above are just short-term band-aids at the moment. As spammers innovate, users of products like Google Analytics are in danger of falling prey to bogus referral traffic through more sophisticated means. Google and other web analytics providers will hopefully, create new mechanisms to combat this referral spam. However, without some serious changes to the current system, the world of web analytics may be in for some unpleasant surprises. 

Update: Google is working on a solution to resolve the referral spam issue. They're well aware of the problem and will be releasing tools in the coming months. I'll be sure to update this post whenever that work comes to fruition. 

For reasons why not to use the Referral Exclusion List and much more on referral spam in general, check out Mike Sullivan's guide: http://www.analyticsedge.com/2014/12/removing-referral-spam-google-analytics/

For a great visual walkthrough of referral spam solutions, view Carlos Escalera Alonso's recent write-up: http://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/


Looking for more help with your Analytics and Optimization strategy? Send us a message.

Related Articles