There is more than just placing a small snippet on a website to implement google analytics. There are different integration patterns to capture the data into Google Analytics, and each integration is subject to a lot of pitfalls and potential regressions needed to guard against. There are also question as to whether or how to use the different APIs provided by GA.
Besides using a hardcoded implementation of Google Analytics, there exist three main integration patterns for tracking data using a tag manager and pushing it to Google Analytics.
One of the potential integration patterns for Google Analytics, is around scraping information on the website, typically through calls on the Html Dom, but also through extracting data from URL structures, as well as pushing some event listener to capture user’s interaction with specific components on the website.
When going for this type of integration pattern, it is typical to enhance the information already available on the website with data attributes. Essentially hidden information added to the different elements on the HTML page that allow for surfacing these pieces of elements to google analytics.
Above, an example setup of a list using data attributes.
The typical way of extracting this information and passing this data to google analytics is through the use of jQuery and a tag manager. Some tag managers such as Tealium and Ensighten, have facilities within their UI to do these jQuery calls without the need to have to program the code snippets directly in Javascript, while GTM has a specific selector for Events and DOM elements:
Choosing this integration pattern, allows for being able to start pulling data from the information already available on the page and improve it as new information is progressively surfaced. It has the disadvantage of being extremely brittle being dependent on front-end components such as CSS classes, that can change throughout the lifecycle of a website and of increasing the size of the container, by adding all the scraping logic onto it.
The “DataLayer” is a javascript object that serves to provide information to the Tag Manager. Google Tag Manager uses an object named “dataLayer.” In contrast, other tag managers use their own object such as Tealium’s utag_data, Qubit’s universal_variable, while some tag managers like Ensighten is interacting with any JS object present on your page.
Events can be pushed directly onto the dataLayer with the help of some javascript:
This type of integration relies on code placed on the website and directly integrated as part of the different classes and functions used.
Generally, relying on a direct dataLayer integration is often more reliable than alternatives. Events and attributes exposed can more easily be generalized and typically offers higher performance than adding scraping/listening scripts within a tag managers’ container.
The drawback is that it requires websites’ developers to be integrating the logging of attributes and events directly into their codebase, which depending on the organization, might not be the quickest, flexible, or offer the fastest turnaround and might require a“deployment” to be slotted in.
Structured data is a way to provide information related to your page, that arose to help search engine index websites. The different data points abide by specified schemas that are meant to represent the most common actions and entities. This information is typically included in pages through a linked data JSON(ld+json):
This approach can also be leveraged for capturing data for analytics purposes, effectively tying the analytics implementation to the SEO one. This, for good and bad, it allows for re-use but put extra constrains in the analytics implementation.
Besides this, this integration pattern presents several advantages:
Like the dataLayer pattern, this integration pattern does require development on the website, and is subject to the same types of development pitfalls. Another of its drawback is that it requires custom implementation in the tag manager to read the structured data object.
Even having chosen a robust integration pattern, there is quite a lot of pitfalls that one needs to avoid to gather useful and accurate analytics:
Non-Interaction Events: Not properly defining certain events as non-interaction will affect core metrics such as bounce rates.
Duplicate events: Duplicate events might be sent to google analytics due to an error of implementation; duplicate events can affect bounce rates, and make funnels more complicated.
Unexpected page behavior: Unexpected behavior can be the cause of some implementation issue, imagine inputting a coupon or updating quantity of an item in the cart page yielding to a refresh, each refresh would generate a page refresh. This type of unexpected behavior, while not being a technically wrong implementation, provides data that is difficult to interpret.
Mismatching ids: One potential issue that may arise is an inconsistent use of ids across events/pages. What might be used as product-id, for example, for view content might be an internal product id, while when added to cart, an SKU is being used. When finally making a purchase, a variant id is being sent to Google Analytics. These can impact your different reports and make it unable to track behavior across the funnel accurately.
Inconsistent price formats: In some instances, an implementation can contain an inconsistent price format across pages. Sometimes even an invalid price is being pushed. For example, in one implementation, the following 3 product prices data had been implemented in different pages:
Product price should be a numerical encoded string without a currency symbol and needs to be in US currency format.
X-domain tracking: Cross-domain measurement can be complicated to start with. Still, nowadays, some browsers have implemented measures against cross-domain tracking, such as Safari that introduced ITP (“intelligent tracking prevention”), Firefox added “Private Browsing’ blocking different trackers.
Some of the ways to go around some of this cross-domain tracking protection, is to host the resources onto the same domain:
Hit Counts: If there is a plan to use raw data information such as Hit count, max hit count, you can run into some issues in cases where you have implemented tracking for multiple properties based on the same clientId. Google analytics allows the hits to be incremented even though they are not pushed to specific property to allow for merging of sessions in rollup properties.
PII data: PII data or tokens could end up being sent to google analytics, often by inadvertence. By default, the full page URL ends up being sent to google analytics on page view. Google analytics explicitly mandates that no PII data be sent, and having PII data sent to google analytics, might end up in your account being blocked.
Once the tracking is implemented, it also needs to be maintained, is an invisible component, regressions can often be overlooked. And setting up some protection against these regressions through automated tests and monitoring on the website can be primordial to ensure good quality tracking.s
It is possible to set up an automated test on the data layer implementation. These can be set up as defensive measures, as part of the deployment flow, to ensure that code changes don’t negatively impact the tracking currently present on the websites.
Setting up tracking monitoring for Google analytics can be done in multiple ways. One is to rely on Google Analytics Alerts, this, however, requires that you have a well-trafficked website to get the be able to get relevant tracking alerts in ways.
Website monitoring tools such as updown.io, offer an alternative way to monitor that tracking is still present on the website.
Scheduled Selenium tests, is another possibility 3to check that the tracking is still present on the website and conforms to the specific tracking requirements.
Implementing google analytics, sometimes requires integrating with Google Analytics APIs, be it for reporting purpose, to push some backend data, or to provide cost or product information. Google Analytics has three main APIs for these purposes.
The reporting API is Google’s way to allow for programmatic reporting using GA. The reporting API lets you similarly query the datasets to a custom report in google analytics. There is, however, a different naming of the fields, which can be checked using the dimension explorer. Users of Google analytics 360, may opt rather than use the reporting API to use its’ Big Query export capabilities instead.
The measurement protocol allows someone to push data directly to google analytics. This can be used, for example, to push transaction data that did not occur directly onto the website. Or by using a backend service to handle all the clickstream logging.
Google provides a tool to set up the different types of requests using the measurement protocol through its’ hit builder.
The management API allows you to perform tasks such as automating data imports, or manage the remarketing audiences. This API will enable you to upload cost information from external ad providers, such as Facebook, product catalog uploads, or additional user data.
If you have Google analytics 360 available, using this functionality allows you to further perform query time import and essentially handle some of the data available within Google Analytics with master-data.