5 Tips For The Best Data Layer Implementation

July 11, 2021

5 Tips For The Best Data Layer Implementation

When writing custom code for capturing Analytics for your website, one of the most important elements is a Data Layer. A Data Layer is a JavaScript object that contains information about the way users interact with your website. This is especially important for large complex websites where many different combinations of interactions can occur.

Contents

Why do we need a Data Layer?

Analytics reporting suites such as Google Analytics tend to work best when deployed alongside a Data Layer. The Google Analytics documentation states:

Rather than referencing variables, transaction information, page categories, and other important signals scattered throughout your page, Tag Manager is designed to easily reference information that you include in your data layer source code. Implementation of a data layer with variables and associated values, ensures that they will be available as soon as you need them to fire tags.
https://support.google.com/tagmanager/answer/6164391?hl=en

But not all websites need a Data Layer. If your Analytics data is sufficient using the default information gathered by your Report suite, then a Data Layer will result in unnecessary technical debt. A Data Layer can be expensive to maintain. It requires a developer to write custom Analytics code for your website in its current state. But Analytics is ongoing, so later you will likely need a developer to write more custom Analytics code for your website in its future state.

Define your schema

Before writing any code, you must determine what exactly is going to be in your Data Layer. There are 3 key pieces of information that must be established:

What properties are required
How these properties should be structured
What values these properties should expect

It is best to follow a pre-established standard as a guide. The W3C standard recommends that you define a global object called digitalData, and store each category of information in a dedicated nested object. Some of these nested objects may be:

pageInfo – Contains information about the page being viewed, e.g. pageName
productInfo – Contains information about the products that a user intends to purchase, e.g. productName
event – Contains information about actions a user performed on your website, e.g. clicking a button

Example – You run an e-commerce website, and you want to track the products a user has added to their cart. Here is how the digitalData object might look:

digitalData = {
    event: {
        eventName: "Add Item to Cart",
        eventAction: "click",
    },
    pageInfo: {
        pageName: "home page",
        pageUrl: "https://www.example.com/",
    },
    product: [
        {
            productID: "1",
            productName: "Pokemon Cards",
        },
        {
            productID: "2",
            productName: "Yugioh Cards",
        },
    ],
};

Selecting the right Data Layer schema from the start is important, as it is the foundation that much of your analytics implementation will be built around.

Use types

JavaScript is a dynamically typed language, which means that a property in the Data Layer can take any value. TypeScript on the other hand, lets you constrain the allowed values for a given property. This is useful since developers will get instantaneous feedback if they try to assign the wrong type of value to a property, or if they try to modify a property that does not exist.

For example, say we want to capture information using this Data Layer schema:

DataLayer:
    pageData:
        pageName
        pageUrl
    productInfo:
        product1:
            name
            price
        product2:
            name
            price

We can translate this into code using interfaces:

interface PageData {
    pageName: string;

    pageUrl: string;
}

interface ProductInfo {
    name: string;

    price: number;
}

interface DataLayer {
    page: PageData;

    productInfo: ProductInfo[];
}

const initDataLayer = (): DataLayer => {
    return {
        page: {
            pageName: "",
            pageUrl: "",
        },
        productInfo: [],
    };
};

This is especially important for large websites whose Data Layer will be modified by many developers, since it essentially documents which values are allowed in the Data Layer.

For example, one developer may assume that dataLayer.productInfo[0].price is a string, while another assumes that dataLayer.productInfo[0].price is a number. This issue is resolved by leveraging the strictness of TypeScript:

const dataLayer: DataLayer = initDataLayer();

dataLayer.productInfo[0].price = 20.49;    // Valid
dataLayer.productInfo[0].price = "$20.49"; // TypeError

You can also use literal types for an even stricter Data Layer.

interface UserData {
    checkoutStatus: "start" | "in progress" | "complete";
}

interface DataLayer {
    user: UserData;
}

const dataLayer: DataLayer = initDataLayer();

dataLayer.user.checkoutStatus = "in progress"; // Valid
dataLayer.user.checkoutStatus = "progress"     // TypeError

Centralised Data Layer access

When dealing with a large object such as a Data Layer, it is better to avoid directly mutating it. This reduces the risk of side effects being introduced.

We can centralise the access by hiding our global Data Layer object, and exposing a single entry point in the form of a method, updateDataLayer.

import { ObjProxyArg, set } from "ts-object-path";

const globalDataLayer = initDataLayer();

export const updateDataLayer = <T>(proxy: ObjProxyArg<DataLayer, T>, context: Partial<T>): void => {
    set(globalDataLayer, proxy, context);
};

For more information about ts-object-path and ObjProxyArg, you can refer to the documentation or check out this article. But essentially this allows us to retain strong typing when a user calls updateDataLayer.

updateDataLayer((dl) => dl.productInfo[0].price, 20.49);    // Valid
updateDataLayer((dl) => dl.productInfo[0].price, "$20.49"); // TypeError

Validate with unit tests

It should go without saying, but you need to write tests for your Analytics code. A Data Layer is an ongoing commitment, as it will likely need to be modified over time to accommodate changes to your website. Sooner or later, a new developer will likely need to work on the Analytics code, and it will make their job a lot easier if there are tests that validate its current behaviour.

Here are 3 reasons that you should write tests for your Analytics code:

Quality – For Analytics reporting to have any value, you must ensure that the data you are sending is correct. To verify that you are sending the correct data, you need tests.
Bug catching – Analytics will typically take place across the whole website, so it is easy to inadvertently break it. The sooner a developer notices a bug, the better.
Future proofing – Over time it is easy for business requirements to become lost or forgotten. Having unit tests essentially embeds business requirement documentation within the code, which means that future developers can understand what they are working with.

When a developer makes a change that affects the UI, it will likely be picked up before it has any hope of making it to production. But Analytics takes place in the background, so breaks cannot be detected visually. If a developer inadvertently changes the Data Layer behaviour, and there are no tests to inform them that the behaviour has changed, then it can quite easily make it into production. Once that happens, the break likely won’t be spotted until an Analyst detects an anomaly in the data, which could take months.

Error Handling

Analytics should never disrupt the users journey. Any errors that arise as a result of Analytics must be handled gracefully. Because Analytics can take place in many different parts of the website, there are many opportunities for errors to arise.

Watch out for ad blockers. In the age of data privacy concerns, more and more users are using software to block websites from tracking them with Analytics. For example pi-hole blocks network requests from being sent to common Analytics-hosting platforms. When a network request is blocked, it will result in a network error. If you are not handling this error, then the whole website may stop working for that user.

Circling back to the third point, Centralising the Data Layer access, the best way to have trustworthy Analytics error handling is to use a centralised method for all Analytics related changes. Then all you need to do is add a try/catch block to this method. That way if something goes wrong, then you can just log the error and proceed without disrupting the user’s journey.

const updateDataLayer = <T>(proxy: ObjProxyArg<DataLayer, T>, context: Partial<T>): void => {
    try {
         // insert code here
    } catch (e) {
        console.error(e);
    }
};

If you want to try some of this code for yourself, check out this sample repository.

Lachie

5 Tips For The Best Data Layer Implementation

Why do we need a Data Layer?

Define your schema

Use types

Centralised Data Layer access

Validate with unit tests

Error Handling

Leave a Reply Cancel reply