5 Tips For The Best Data Layer Implementation
When writing custom code for capturing Analytics for your website, one of the most important elements is a Data Layer. A Data Layer is a JavaScript object that contains information about the way users interact with your website. This is especially important for large complex websites where many different combinations of interactions can occur.
Why do we need a Data Layer?
Analytics reporting suites such as Google Analytics tend to work best when deployed alongside a Data Layer. The Google Analytics documentation states:
Rather than referencing variables, transaction information, page categories, and other important signals scattered throughout your page, Tag Manager is designed to easily reference information that you include in your data layer source code. Implementation of a data layer with variables and associated values, ensures that they will be available as soon as you need them to fire tags.
https://support.google.com/tagmanager/answer/6164391?hl=en
But not all websites need a Data Layer. If your Analytics data is sufficient using the default information gathered by your Report suite, then a Data Layer will result in unnecessary technical debt. A Data Layer can be expensive to maintain. It requires a developer to write custom Analytics code for your website in its current state. But Analytics is ongoing, so later you will likely need a developer to write more custom Analytics code for your website in its future state.
Define your schema
Before writing any code, you must determine what exactly is going to be in your Data Layer. There are 3 key pieces of information that must be established:
- What properties are required
- How these properties should be structured
- What values these properties should expect
It is best to follow a pre-established standard as a guide. The W3C standard recommends that you define a global object called digitalData
, and store each category of information in a dedicated nested object. Some of these nested objects may be:
pageInfo
– Contains information about the page being viewed, e.g.pageName
productInfo
– Contains information about the products that a user intends to purchase, e.g.productName
event
– Contains information about actions a user performed on your website, e.g. clicking a button
Example – You run an e-commerce website, and you want to track the products a user has added to their cart. Here is how the digitalData
object might look:
digitalData = {
event: {
eventName: "Add Item to Cart",
eventAction: "click",
},
pageInfo: {
pageName: "home page",
pageUrl: "https://www.example.com/",
},
product: [
{
productID: "1",
productName: "Pokemon Cards",
},
{
productID: "2",
productName: "Yugioh Cards",
},
],
};
Selecting the right Data Layer schema from the start is important, as it is the foundation that much of your analytics implementation will be built around.
Use types
JavaScript is a dynamically typed language, which means that a property in the Data Layer can take any value. TypeScript on the other hand, lets you constrain the allowed values for a given property. This is useful since developers will get instantaneous feedback if they try to assign the wrong type of value to a property, or if they try to modify a property that does not exist.
For example, say we want to capture information using this Data Layer schema:
DataLayer:
pageData:
pageName
pageUrl
productInfo:
product1:
name
price
product2:
name
price
We can translate this into code using interfaces:
interface PageData {
pageName: string;
pageUrl: string;
}
interface ProductInfo {
name: string;
price: number;
}
interface DataLayer {
page: PageData;
productInfo: ProductInfo[];
}
const initDataLayer = (): DataLayer => {
return {
page: {
pageName: "",
pageUrl: "",
},
productInfo: [],
};
};
This is especially important for large websites whose Data Layer will be modified by many developers, since it essentially documents which values are allowed in the Data Layer.
For example, one developer may assume that dataLayer.productInfo[0].price
is a string
, while another assumes that dataLayer.productInfo[0].price
is a number
. This issue is resolved by leveraging the strictness of TypeScript:
const dataLayer: DataLayer = initDataLayer();
dataLayer.productInfo[0].price = 20.49; // Valid
dataLayer.productInfo[0].price = "$20.49"; // TypeError
You can also use literal types for an even stricter Data Layer.
interface UserData {
checkoutStatus: "start" | "in progress" | "complete";
}
interface DataLayer {
user: UserData;
}
const dataLayer: DataLayer = initDataLayer();
dataLayer.user.checkoutStatus = "in progress"; // Valid
dataLayer.user.checkoutStatus = "progress" // TypeError
Centralised Data Layer access
When dealing with a large object such as a Data Layer, it is better to avoid directly mutating it. This reduces the risk of side effects being introduced.
We can centralise the access by hiding our global Data Layer object, and exposing a single entry point in the form of a method, updateDataLayer
.
import { ObjProxyArg, set } from "ts-object-path";
const globalDataLayer = initDataLayer();
export const updateDataLayer = <T>(proxy: ObjProxyArg<DataLayer, T>, context: Partial<T>): void => {
set(globalDataLayer, proxy, context);
};
For more information about ts-object-path
and ObjProxyArg
, you can refer to the documentation or check out this article. But essentially this allows us to retain strong typing when a user calls updateDataLayer
.
updateDataLayer((dl) => dl.productInfo[0].price, 20.49); // Valid
updateDataLayer((dl) => dl.productInfo[0].price, "$20.49"); // TypeError
Validate with unit tests
It should go without saying, but you need to write tests for your Analytics code. A Data Layer is an ongoing commitment, as it will likely need to be modified over time to accommodate changes to your website. Sooner or later, a new developer will likely need to work on the Analytics code, and it will make their job a lot easier if there are tests that validate its current behaviour.
Here are 3 reasons that you should write tests for your Analytics code:
- Quality – For Analytics reporting to have any value, you must ensure that the data you are sending is correct. To verify that you are sending the correct data, you need tests.
- Bug catching – Analytics will typically take place across the whole website, so it is easy to inadvertently break it. The sooner a developer notices a bug, the better.
- Future proofing – Over time it is easy for business requirements to become lost or forgotten. Having unit tests essentially embeds business requirement documentation within the code, which means that future developers can understand what they are working with.
When a developer makes a change that affects the UI, it will likely be picked up before it has any hope of making it to production. But Analytics takes place in the background, so breaks cannot be detected visually. If a developer inadvertently changes the Data Layer behaviour, and there are no tests to inform them that the behaviour has changed, then it can quite easily make it into production. Once that happens, the break likely won’t be spotted until an Analyst detects an anomaly in the data, which could take months.
Error Handling
Analytics should never disrupt the users journey. Any errors that arise as a result of Analytics must be handled gracefully. Because Analytics can take place in many different parts of the website, there are many opportunities for errors to arise.
Watch out for ad blockers. In the age of data privacy concerns, more and more users are using software to block websites from tracking them with Analytics. For example pi-hole blocks network requests from being sent to common Analytics-hosting platforms. When a network request is blocked, it will result in a network error. If you are not handling this error, then the whole website may stop working for that user.
Circling back to the third point, Centralising the Data Layer access, the best way to have trustworthy Analytics error handling is to use a centralised method for all Analytics related changes. Then all you need to do is add a try
/catch
block to this method. That way if something goes wrong, then you can just log the error and proceed without disrupting the user’s journey.
const updateDataLayer = <T>(proxy: ObjProxyArg<DataLayer, T>, context: Partial<T>): void => {
try {
// insert code here
} catch (e) {
console.error(e);
}
};
If you want to try some of this code for yourself, check out this sample repository.