3 Important Google Analytics Data Quirks
According to BuiltWith’s Audience Measurement Technologies Usage Statistics, Google Analytics is used on 43% of the Top 10,000 websites and on 83% of all websites on the Internet. With so many sites leveraging Google Analytics it is important to know the nuances of how data is collected, processed, and reported back. Knowing these intricacies can help you improve your understanding of the platform, aid in your analysis of your audience, and determine the impact on key performance metrics.
Let’s take a look at a few of the most important Google Analytics nuances:
Sampling
Google Analytics is a free service (unless you have $150K to upgrade to Premium) and, as with any free service, there are caps built into the platform to restrict the processing power available to generate reports. For Google Analytics, the cap is a restriction on the number of Sessions that can be analyzed in the reports before a sampled data set is used to extrapolate results. This cap is 500,000 Sessions.
Meaning, if you ask Google Analytics to generate a report that includes more than 500,000 Sessions, the results in that report will be based on a subset of the actual user data. For example, if your report includes 1,000,000 Sessions, then Google will take a 500,000 subset of these sessions and then double that numbers to reach a million.
How do you tell if your data is Sampled?
Simply look for this message and “grid” icon to appear:
At first, the Sampling will be based on ~250,000 Sessions. This can be adjusted up to the 500,000 maximum by clicking the “grid” icon and moving the slider in the direction labeled “Higher Precision.”
Most standard reports within Google Analytics will allow you to analyze larger data sets due to the simplicity of the reports and low processing power used by Google. Sampling starts to take effect, typically, when applying Advanced Segments or Secondary Dimensions to the standard reports, or when you build Custom Reports that utilize a number of filters.
How can I avoid Sampling?
To get around Sampling you will have to pull your data in smaller sets by adjusting the date ranges so that you do not exceed the 500,000 Session cap.
This can be incredibly difficult for large organizations where the traffic to their site may exceed the cap on a weekly basis. In this case I would recommend either upgrading to Premium (which removes Sampling and gives you full processing power) or leveraging the Google Analytics API for Google Sheets to automate the data pulling and aggregation.
Direct Channel Attribution
Google Analytics, by default, utilizes a last-touch attribution model with one exception: Direct traffic. Google views Direct traffic to your website as the weakest channel for attribution because it says very little about the visitor. As a result, if a visitor has a previous traffic source besides Direct (i.e. Organic Search, Social, Referral, etc.) this will be the channel that receives attribution.
You can see this attribution in action by comparing Goal completions by Default Channel Grouping versus Top Conversion Paths (located in Multi-Channel Funnel reporting). The below Default Channel Grouping report has attributed 11 conversions to Direct:
In contrast, the Top Conversion Paths report (set to All path lengths) shows 17 conversion paths that end in Direct:
In the cases where Direct is not the only channel present in the visitor’s path to conversion, the proceeding channel to Direct will get credit for the conversion. In the above case, 4 will be attributed to Organic Search, 1 to Social, and 1 to Referral; resulting in the 11 from the Default Channel Grouping report.
Interaction vs. Non-Interaction Events
There are two types of Events that Google Analytics Event Tracking can fire – Interaction and Non-Interaction Events.
By default, Events are of the Interaction type. This means when a visitor engages with your website causing Event Tracking to fire, these interactions will be used by Google Analytics to calculate metrics such as Bounce Rate, Average Session Duration, or Time on Page.
Conversely, the Non-Interaction type is when Event Tracking is fired but the Event will not be used to calculate Bounce Rate or other time-based metrics.
A Non-Interaction Event is most commonly used when the Event is fired automatically on page load as part of an on-page funnel or other view-type metric important to your business objectives. When Interaction Events are implemented at page load you will typically see your Bounce Rate improve drastically:
This improvement is potentially a false positive as a result of the newly introduced Interaction Event now impacting Bounce Rate because the Event is being considered in the calculation.
How do I fix this issue?
To correct tracking issues involving Interaction or Non-Interaction Events, you need to change the Event Tracking code that is used to fire the misbehaving Event (or ensure a checkbox is checked if you use Google Tag Manager).
By default, your Event Tracking code could look like this for an Interaction Event:
ga(‘send’, ‘event’, ‘Videos’, ‘play’, ‘Fall Campaign’);
To enable the Non-Interaction Event your code should be amended to:
ga(‘send’, ‘event’, ‘Videos’, ‘play’, ‘Fall Campaign’, {nonInteraction: true});
The addition of the “nonInteraction:true” FieldObject tells Google Analytics this Event should no longer impact you calculated metrics.
What Does This Mean?
Be confident in the data that you are using to make decisions about your digital strategy. As marketers, webmasters, or anyone involved in the digital strategy, be sure to vet your data sources and understand the nuances of how data is collected and reported. This could save you time, money, and from major headaches down the road.
TL;DR
Here are three Google Analytics data quirks:
#1: Sampling – Google Analytics free version will process a maximum of 500,000 Sessions before using a sub-set of your data to extrapolate (i.e. guess) actual performance.
#2: Direct Channel – Google Analytics treats this as the “weakest” channel and conversions where the last touch was Direct but was proceeded by any other channel will not be attributed to Direct.
#3: Interaction vs. Non-Interaction Events – Event Tracking can be tricky. If Bounce Rate magically improves, check your Event Tracking for the non-interaction setting.