GA4 – GDPR Compliance | Facing The Hard Truth

Will Rice
First published March 16th, 2023
Last updated October 31st, 2023
Learn how GA4 uses cookies and how this affects GDPR, whether Google Analytics is actually free & how to capture your own events.
GA4 – GDPR Compliance | Facing The Hard Truth

Want to become a better marketer? Today, we’ll be looking at the truth of Google Analytics 4 (GA4). We’ll answer the question ‘what can GA4 do for you?’ We’ll also try to reduce some of the false beliefs that many people have relating to GA4.  

Ultimately, we’ll show what you need to know as a marketer to build a successful (and fully GDPR compliant) data collection strategy. You’ll get better data so that you can boost your ROI. So, let’s dive in!

This is a write up of the talk given by Zach Randall at GA4ward MKII. You can find his slides here, and the recording is below too:

 

Let’s start with a story…

Six months ago, a very large company came to us wanting to use our platform to solve their measurement challenges. Ultimately, ListenLayer took its entire measurement strategy and implemented it end to end. The client had a successful engagement and things went well. At the end of the process, they asked for help migrating to GA4.

We flipped a switch, pressed a couple of buttons, and the data started flowing into GA4. ListenLayer carried out all the testing ourselves. We used the preview tool, checked the network calls, and made sure data was going into GA4. Unfortunately, when we jumped back into GA4 a week later, things hadn’t quite gone to plan.  

 

Existing event

 

Our events table didn’t look like the image above. But there wasn’t any clear reason why. We were certain that we were getting the right data. After all, we saw the data in our testing. We decided to wait a little longer to see if the data was just delayed. We waited a week and still, nothing was showing up. By now, the client was getting a little antsy.

Reporting Identity

 

So, what was the problem? Well, the answer, believe it or not, was actually easy. We had to pull up the ‘Reporting Identity’ settings and switch from blended to device-based.  Within a few minutes of doing this, data started to enter the events table.

 

But why did this happen?

The issue was down to what Google refers to as ‘thresholds’. Essentially, if you’re measuring using a certain method, you’ll only get information when there’s enough data. The strange thing was that we never had any issues with events in other reports.

 

What is GA4 really capable of?

Google Analytics 4 had caused a lot of heartache and potentially ruined a client relationship. At this point we started to ask ‘what are the pitfalls of GA4?’ and ‘how can we benefit?’. Google seems to be forcing us to walk across hot coals whilst saying ‘you will like it!’.

 

3 Truths

So, what is it that GA4 actually can do for you? Let’s try and dispel some of the misinformation with three truths:

  • Cookie confusion – The hard truth about GA4’s progress when it comes to cookies and consent and the General Data Protection Regulation (GDPR).
  • The “free” fallacy – Google has made this idea of free data a problem in our industry. By making it free, they have effectively devalued data. People aren’t willing to pay for good data. There are a number of pitfalls that come with using a free tool. We’ll try to break them down and show how you can overcome them.  
  • The afterthought – We’ll show two specific actions that you can take to be proactive in your GA4.

 

Cookie confusion

It’s not uncommon to find an article with a headline that reads something like this ‘cookies are going away!’. Let’s just establish something now; cookies are not going away. In reality, third-party cookies are going away. Understanding and differentiating these two things are extremely important.

Cookies themselves are not inherently evil. When used properly, they’re actually critical to how a lot of websites function.

You’d think that, as a reward for walking over hot coals, Google would offer some reward. Perhaps an improvement on how it uses cookies or a boost to GDPR compliance in GA4.

This brings us to our first false belief…

 

GA4 is cookie-less

Google Analytics 4 is absolutely not cookie-less. It is not configured or set up to enable a cookieless future. Above all, GA4 is not GDPR compliant. Let’s break this down.

Cookie usage on website

 

This is from Google’s documentation. They say specifically that the JavaScript library uses first-party cookies.

Google has made some improvements. They allow you to customize the cookie’s expiration date (you can go from a two-year cookie down to a 25-day cookie). You can also change whether an expiration is relative to the most recent or earliest session.

But none of these changes are worth throwing a party over. They don’t increase compliance or accomplish a great deal.

Expiration of Cookies

Remember: whilst third-party cookies are going away, first-party cookies are staying. GA4 uses first-party cookies, so the loss of third-party cookies isn’t going to change much.

The reality is, however, that GDPR doesn’t differentiate between first-party and third-party cookies. So, the question becomes: ‘why isn’t GA GDPR compliant?’.

 

A complex issue

Many would point to the issue of IP storage. Because GA4 doesn’t collect IP addresses, it is GDPR compliant. But this isn’t true at all. There are many more layers of complexity to the problem.

Firstly, there’s the cookie question. As we’ve just established, GA is still using cookies. This means that you have to ask for explicit consent. You’ll be getting gaps in user data due to those who don’t consent (as well as those who don’t consent straight away).

There is also the data transfer issue. You might have heard that last year Google Analytics was ruled completely illegal by the CNIL (The French supervisory data protection authority).

 

Official press relese

 

Here is the official press release from one of those judgments. Within the context of using GA, unique identifiers are assigned to each visitor. The identifier constitutes personal data. Even if we solve the IP address issue, we still have a problem with the personal identifier.  

This has led to some speculation…

 

Google may try to replace cookies with Google Signals

If Google can replace cookies, they can get rid of the identifier. The data transfer issue becomes a non-problem. Problem solved! Or is it? Let’s look at Google Signals in more detail.

 

Google signal

 

This is a quote from Breen Baker, a product manager at Google who works on the GA4 team. Let’s break this down.

If a user comes to your website from two different devices, they are likely going to be assigned two GA IDs. Google is then going to duplicate that from a cross-device tracking standpoint. They’re using Signals to do this.

Let’s say that you have some users on your website that are logged in with a Google account. But you also have a lot of users that are not logged into their accounts. Google cannot identify a user simply because they are logged into their Google account. Google is using the logged-in status for other purposes, such as deduplicating users across devices.

Digging into your settings in Chrome, you’ll see that there is certain information that can be derived from your history. But Google doesn’t have the ability to record your screen or track every event that occurs and trace it back to a user’s account. In short, Signals cannot replace cookies as Google is using them.

But there is one situation where they could do this. This would be If Google decided to replace cookies with a ‘fingerprint’.    

 

Is fingerprinting bad?

Fingerprinting is simply an alternative way to identify a browser in order to associate it with server-side data. Fingerprinting really is no different than a cookie. It essentially carries out the same tasks without placing a cookie on a user’s browser.

But what makes a fingerprint bad (or good)? We’ll give two examples in the form of questions.  

Are you using fingerprinting to identify users as people? Most importantly, in this situation, you aren’t asking for a user’s consent. This would be a bad way of using fingerprinting. You’re misleading users and following them around.

Are you using fingerprinting to group events to conceptualize a user for analysis?

 

The ListenLayer Approach

De-identification

 

This is something that we have spent a lot of time studying and building into ListenLayer. When we first started to build an analytics engine, foundationally we had to identify an actual user.

There are certain situations where we don’t use cookies and we are still able to identify users. We do this by using a ‘soft fingerprint’. I.e. we’re not pulling every piece of information about a user from their computer. Instead, it’s just a list of general things about a browser.

The important thing is that we pair this up with time destruction. Think of this as a ticking time bomb. You put this together with the fingerprinting information and keep it client side. We then hash this before it gets sent to our server.

The hashing function when paired with the time destruction is what allows us to do user de-identification. This means that after the defined time lapses, the thing self-implodes. You can never take that and trace it back to a user.

We can de-identify events, without using cookies, and we can aggregate them. Above all, we can track conceptualized users before they opt-in.

 

The fingerprinting solution

Fingerprinting solution

 

This is how you are able to track and collect behavioural data and remain GDPR compliant. Specifically, this ensures that you’re not using unique identifiers that are traceable to users. You’re not placing any cookies or collecting personally identifiable information.

Could Google put this solution into action? There’s no doubt. The company has some of the most intelligent engineers in the world.

Google signal works

 

But there’s something we need to remember about what Google is doing with Signals. They are being used for targeting purposes. Whether Google fixes its cookie problems or not, there are still issues with how they utilize the data.

 

The “free” fallacy

The “free” fallacy

 

Nothing in life is truly free. Let’s look at a real-world legal example when it comes to exchanging your data for a free tool. This is what the relationship with Google looks like. You enter an agreement with Google, where you get free use of their platform. In return, you’re effectively giving them use of your data.

This would meet the legal definition of a contractual economic relationship. There’s an exchange occurring here. Your Google Analytics settings, and how your data is being used are where the rubber meets the road.

 

Google product & services

 

To manage this, you’ll need to go to ‘Account Settings’. From here, you can enable or disable Google using your data in their products and services. In 99% of situations, you’ll want to disable this.

Something to note is that Google says ‘if you disable this option, data can still flow to the Google products linked to your property.

Modeling contributions

 

There’s a second option where you can let Google model things based on your data. Here Google explicitly says that they will aggregate and de-identify your data. This reinforces the idea that Google is collecting unaggregated data about your users. And this data is identifiable.

 

Have you heard about selling & sharing?

 

The beauty company Sephora was fined $1.2 million by the State of California. The company had violated the selling provision of California’s privacy law.

The case was settled outside of court. There isn’t a whole amount of information about what the issue was here. But it’s clear, based on the opinion of California’s Attorney General, what was going on.

Sephora was not taking and selling data from its users. Instead, they were taking data and giving it to partners. The partners were then using this to power remarketing and advertising. This is the same as what you can do in GA.

You’re using a free tool and giving data in exchange for a purpose outside of the tool. Because of this, all of a sudden, you’re falling under this selling provision.

As a result of this, California introduced a new addendum to the bill called CPRA. Because of the combination of CCPA and CPRA, we have a much clearer definition. The concept of sharing data about your users with a third party becomes illegal in certain situations.

 

Selling and sharing

 

Take a minute to read the extract in the image above. The new law clarifies that the act of sharing data with someone, who then uses it for advertising, suddenly falls into this law.

 

Global Privacy Control

 

Global Privacy Control

 

You’re probably thinking ‘thank goodness this is in California and doesn’t apply to me’. Well, unfortunately, California is starting to follow the Global Privacy Control (GPC).

GPC means that a lot of jurisdictions are now starting to adopt laws similar to California. Luckily, GPC is really simple. It either doesn’t exist, or it is set to ‘true’. It isn’t broken into lots of different categories like GDPR.

And if GPC is ‘true’, it has a very simple definition. This is because ‘selling and sharing’ is not allowed.

 

The facts

  • Using a free tool like GA implies that you might be selling your data under certain jurisdictions. Bear this in mind as jurisdictions start to implement legislation relating to selling or sharing data.
  • The act of sharing your data with other Google properties through your GA account for potential advertising is enough to trigger the selling/sharing provision.  
  • The act of explicitly sharing your data with Google causes the same.

 

What’s the correct structure?

Correct structure

Google popularized the categories seen above based on GDPR and CCPA. We start to introduce this idea of selling and sharing. But how on earth does it apply here? Let’s break this down.  

Selling and sharing are not exclusive to analytics or advertising. It should be a separate category where any cookie or data point can be flagged as selling and sharing.

 

The correct approach

 

In ListenLayer you can assign a tool like Google Analytics to the analytics category. You can flag that you are using the service to sell or share customer data.  

 

The truth

If you don’t completely separate your behavioural analytics from advertising data, you must block your behavioural analytics under GPC jurisdictions when opted out.

At this point, you need to ask yourself the big question. Should we really centralize our data and only rely on Google Analytics or the Google Tag (a single tag that lets you do everything across Google Ads, GA, and other tools)? The problem is that you’ll be automatically sharing data from GA, Google Ads, and others.

None of us like to deal with these privacy laws. But the more we share our data with centralized organizations, the more compliance issues arise. Regulators will inevitably start addressing these issues.

 

The afterthought

Let’s talk about two things that you can do to be proactive within GA.

 

#1 – Own your data

Own your data

Go into your account settings and make sure that the box shown above is unticked (unless you’re feeling charitable and want to give your data to Google). But don’t be fooled. Just disabling this, doesn’t mean that your data isn’t being utilized in a way that falls under data regulations.  

 

 

If you don’t want data to fall foul of GPC regulations, make sure you don’t use the above connections. Unfortunately, a lot of the benefits of using Google Analytics relate to its connections. So, whether this is recommended depends on your situation.

The other option is to use a secondary tool for your behavioural analytics. This can help you to fill the gap of people that fall into the GPC.

 

Export your data to BigQuery

 

Lastly, export your data to BigQuery, even if you aren’t going to use it.

 

#2 – Capture your own events

The power of GA4 is using your own data. So, start to get creative and come up with ways that you can feed data in ways that are specific to you.

Engaged session

 

This is Google’s definition of an engaged session. But let’s be honest, it isn’t great. Someone that converts is engaged, as is someone viewing two or more pages. But someone that views a webpage for more than ten seconds isn’t necessarily engaged.

 

Ultra engaged

 

Instead, we like to create our own event. Here the threshold is that a user must have a tab visible for 30 seconds or more, and to have scrolled for at least 20% of a page. From here, you might go from a 56% engagement rate to 45%.

 

 

Don’t filter out internal users as you’ll lose the data. Instead, register a user property as a dimension. This will allow you to identify internal users based on their IP addresses.

 

Multiple versions of the same event to GA

 

Lastly, we are sending multiple versions of the same event to GA. The reason is that it helps us to learn how reporting is working. The image above shows an example of where we are collecting all form submissions as one event.  

But we’ve broken them out specific categorized form submissions as other events. We’re also playing with a standard Google event called generate_lead to power certain types of reporting.

 

Wrapping up

To wrap up, let’s return to our 3 truths.

  • Cookie confusion – Google Analytics is not cookieless nor does it solve GDPR.
  • The “free” fallacy – Whether or not you should be sharing data between GA and Google Ads depends on your situation. Do you want to remove GA from your site when someone has GPC set to ‘true’? Or, do you want to separate and silo those data pools?
  • The afterthought  – You can be proactive in GA4 by owning your data and capturing your own events.

With these truths, you can get better data and boost your ROI. So, why not consider how they apply to your use of GA4?

 

Further Reading

Want more? Check out our blog for tips on Google Analytics, GTM, and a whole host of other Google packages.

 

About Zach Randall

Zach Randall is the Founder & CEO of ListenLayer – an end-to-end measurement platform that enables marketers to boost their ROI by giving them the data they’ve always wanted (but can’t get from other platforms).

He is a veteran digital marketer with over 15 years of experience building and running an agency that was recently acquired at an 8-figure valuation.  Zach loves helping other businesses achieve their growth goals through data & analytics – but he is a husband and father first.

Zach W. Randall

Will Rice
Follow me
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Articles from our Blog
0
Would love your thoughts, please comment.x
()
x