WiseAnalytics | Using clickstream event collectors to complement your Google Analytics

Insights

Using clickstream event collectors to complement your Google Analytics

4 min read

By Julien Kervizic

Clickstream events collectors are applications that let you collect raw clickstream data from the front-end of an application. There are multiple reasons why you should rely on these event collectors, and setting them up isn’t that complex.

Why use clickstream event collectors

ClickStream event collectors let you leverage raw data

There is at the same time an overlap and a complimentary value between these solutions. Snowplow and Divolte are event collectors, collecting the raw data that is needed to do deep analysis. Raw data can be used to increase the depth of analysis, it can for instance be leveraged for conversion rate optimization (CRO) for instance, where it allows for a granular analysis of the different customer touch point using click path analysis.

Exporting clickstream to raw data is a feature that is offered within Google Analytics 360, but not the free version, and if you are only looking at acquiring the 360 version for this feature, going the snowplow route might turn out to be more cost effective.

and they make it easier to bypass ad-blockers

The second complimentary value, lies in using them to potentially bypass some of the tracking restriction of ad blockers. Being open-source solutions that need to be self hosted, you can easily set them up on your own domain

Setting them up on your own domain allows to bypass domain black-lists and modifying or having a different tracking script lets it avoid checksum detection. The ability of further customizing the tracking script name, allows it to bypass ad-blockers looking for “track” or “analytics” in their name.

These solutions allow the data to be ingested as a data stream, and creating an application, it is possible to push back this data to Google analytics or other analytics tools.

they can make clickstream an integral part of an application

The clickstream collectors ability to push data to a message broker such as Kafka, allows to include them as an integral part of an application. The type of applications that could rely on this type of data range from real-time reporting, real-time marketing trigger to real-time (prediction) model evaluation.

Setting up a clickstream event collector

Components

Tracking Script: The tracking script role is to capture the different actions performed by users browsing the website and pushes these events to the clickstream collector API.

Clickstream collector API: A clickstream collector API, is merely a receiving end point of an API, that might perform 1) request authorization 2) schema validation and push the data to a message broker for ingestion.

Message broker: A message broker is there to allow for the asynchronous processing of the data. One of the most popular message broker for data is Apache Kafka. Applications can directly consume the data stream to compute real-time aggregates or filter the stream.

Data Sink: A data sink will take the incoming data from the message broker and push it to the storage layer. This is usually a S3 bucket on AWS, a DataLake storage on Azure or a plain HDFS file.

Storage Layer: The storage layer provides a long term storage for the incoming data. Most compute engine on Hadoop such as Presto, Spark or their cloud equivalent such as AWS Athena are able to query files on a bucket storage.

Containerized Applications

At their core, setting up these solutions can be done through setting up “Container” applications, both snowplow and divolte provide docker images. These can be setup on a VM, a container service such as Azure Container instance, or on Kubernetes or docker-swarm.The containers should interact with a load balancer for autoscaling and a messsage queue. The docker-compose file of divolte for example bootstrap a Kafka instance for example.

Tracking Script

There are different methods for the implementation of these clickstream loggers from a front end perspective.

Leverage their tracker SDK directly: this can be done in the following way for divolte: divolte.signal(‘myEvent’, { param: ‘foo’, param2: ‘bar’ })
Leverage the Snowplow tracker protocol: This is a similar setup to Google Analytics Measurement protocol but for snowplow, allowing direct calls to the tracking API
Leverage custom plugin for Google analytics: This lets you leverage your current tracking for Google Analytics and duplicates the tracking for Snowplow/Divolte

Conclusion

Using a clickstream event connector can bring a lot of benefits in terms of how your data is being tracked and in the granularity of the data available or making the data available in real-time to the application. There are open source solutions for it that can easily be deployed based on docker containers.

‍