Measured Direction Podcast

Episode #4 - Testing Clicks

Download the Episode

In this episode, we discuss Testing Methodology, specifically Experimental Sample Sizing. Then we move on to Clicks vs. Session Metrics Discrepancies and general analytics discrepancies.

Show Links:
#measure Slack

Transcript

Jason Rose:
The first episode of Measured Direction. I am Jason Rose, a content strategist here at Digital Surgeons.

Jason Rose:
I'm joined, as always by Tom Miller, the leader of our analytics practice. What's up, Tom? What's up?

Tom Miller:
Looking forward to us. Always getting over the questions that you guys have submitted through Bitly slash measure direction or on Twitter using the hashtag measure direction.

Jason Rose:
Once again, that's B.I. T. L. Y slash measured direction or the hashtag measured direction where you guys can submit the questions that we can put to Tom and see what we can work on.

Tom Miller:
Let's you. I always cut myself short. All right, cool.

Jason Rose:
So first question, when you're testing something like a landing page but only have a small amount of traffic. How do you determine when the test is finished? And this question was an anonymous one from someone here at a digital surgeons.

Tom Miller:
Ok. That's that's pretty good question. I mean, testing optimization is something that we do a lot here at DHS as part of our part of my analytics practice. So when you're talking about, you know, a landing page or really any page that you're testing, certainly the volume of traffic coming to that page is a consideration. And the reason that is the case is because when you talk about making a change to a page and evaluating it via an experiment, which is sort of how testing tools are set up, you you need a certain divergence in the effect of what you're trying to test from the control in order to demonstrate significance. Right. And so, you know, the way a testing tool works is it takes your incoming traffic and it separates the traffic into a control and other experimental groups. Right. And that that sort of keeps everything controlled when it comes to other effects of things that might be going on with those users. Right. And things that could be like day the week effects or something like that. Everybody is has an equal chance of seeing it, either being if they're part of the experiment, you know, of getting into an experimental condition that makes sense or not. There's no bias that's created what's called a selection bias in that certain people based on certain characteristics or behaviors, are being put into an experiment in experimental condition, whereas others are not.

Jason Rose:
So you're doing an AP test and it really boils down to the likelihood of someone saying a.

Jason Rose:
So whether it's a button on a Web site or the B button, whatever it is, the variable, they have an equally like same chance of seeing one or the other.

Tom Miller:
Well, I mean, it doesn't have to be equally as likely. Right. So you could say I only want to put 10 percent of my traffic into an experiment and I only want to show 10 percent of my traffic, 10 percent of the experimentally the experimental segment, the control. Right. So I don't want it to make it seem like it needs to be perfectly balanced. But you do need to be randomly selecting people into those groups. Right. So it's sort of a tangent to the question. The question is, is. OK, so you're you're trying to test something that gets a very low volume of traffic.

Jason Rose:
So what the way that you calculate out how much traffic it's going to need is you back out of what you are expected or what your minimal effect is that you're willing to accept for that experiment. Right.

Tom Miller:
And when you're talking about a low volume traffic page or condition, that effect is going to need to be larger because the way that significant works is that it is easier to demonstrate significance on a larger effect than it is on a smaller one. Right. So a very small effect needs a high volume of people within that experimental condition in order to prove it to actually be the effect of the experimental condition and not just due to random chance. Right. A larger effect that that volume is much less. So I would say with a small, you know, a small a lightly trafficked page or experience or whatever you want to call it, you either need to be shooting for big effects or you need to be thinking about some way of understanding people's interaction with that page that go beyond an experiment. Right. Experimental methodology. And there are other ways that you can you can do that. Right. You could look at. You could create. You could focus group it. You can usability test it. You can do that in-house. You can actually outsource usability testing. So. You know, maybe what you're trying to do is get someone to do something. But the, you know, your conversion event, but the conversion event, you know, there might be some really obvious usability things that could be done to the page that might yield a great, you know, a great deal more conversions then sort of poking around with experimental tests. There might be something that you could achieve much quicker and see an effect and do a, you know, sort of a post hoc analysis on as opposed to doing a an experiment.

Jason Rose:
What do you define as a big enough effect that won't necessarily matter that there's lower volume? Does that make sense?

Tom Miller:
Well, really depends on the volume. So, you know, one thing that I have noticed is this is sort of a difficult nut to crack, right?

Tom Miller:
Because there is sort of this experimental volume condition for a lot of sort of lightly trafficked sites or lightly trafficked interfaces is is odorous for testers sometimes. And I've sort of noticed this with with sort of the big community has sort of crept down. And, you know, the the academic standard is usually like ninety five percent. So, you know, whenever you hear about the P value, I mean, I used to run tests for, you know, interface tests in college and our P values were always 95 percent, which means that there is a five percent likelihood that the effects that you are seeing are due to randomness and not due to the actual experimental effects right now. I've seen that creep creep down and, you know, rightly or wrongly, to 90 to 80, you know, by people that are practicing tests, you know, within companies now 80, you know, that's a one in five chance of being incorrect, basically. Right. Of doing a test and seeing a success. But that success not being due to the experimental condition and being due to Memphis, being due to the sampling in people that were more likely to convert into the into the wrong sample set disproportionately.

Jason Rose:
Right.

Tom Miller:
When you think about that, it's like the landing page on the e-commerce site, one in five. If you're optimizing toward something that could potentially be 20 percent wrong, that's a lot of lost revenue.

Tom Miller:
Right. Right. And so, you know, I like to use the coin flip analogy. When I when I talk about this, it's like, you know, it's very unlikely that if you flip a coin a thousand times, it's going to come up heads every single time. But it's possible. Right. And you can express that probability. And it's sort of the same concept, although, you know, maffia when it comes to the idea of significance. Right. And, you know, 80 percent to me, it's like. Sure. And and here's some good reasons to have a relatively low threshold for significance. One of the most difficult things about testing is pacing. Right. So if you're working in an organization, the way that you want to be testing is you don't do one offs. Right. You do a testing program. And that testing program is looking at key interfaces over a long period of time. Right. You're playing the long game of testing. You're not playing the short game.

Tom Miller:
And setting your threshold at 80 means that it gives you a great deal more ability to iterate because you're not stuck with a bunch of iterations in a row that don't do anything right.

Jason Rose:
So by keeping your standards lower, you're actually able to drive the ball down the field further or more often.

Tom Miller:
Right. Even though you might not necessarily be doing things that are actually doing what you think they're doing right. In moving towards that goal. Well, you're you're you're continuing to iterate on a process that within a lot of corporate environments is a difficult one to iterate on. Right. It really takes a lot of commitment for a marketing team to invest in testing. And so keeping your threshold low means that you're able to say, OK, we've had some success and over time the successes are going to bear out. But what you're willing to do is you're willing to sacrifice some air and it's going to be a minor, minor matter. But you really to sacrifice some error over time.

Tom Miller:
That is, you know, roughly three times, four times what it would be if you were using, you know, well what would be considered like an academic level of significance. So I think it's good. I mean, you know, I think that that pretty much answers the question. It's a you know, I mean, I guess the short answer is it's difficult. Like, go for. So for huge changes, right? Yes. That's the practical.

Tom Miller:
That's the bottom line. Take away. Yeah.

Jason Rose:
All right. So a second question comes to us anonymously through the Beverly farm. And the second question is, how do you explain the differences between digital media clicks and clicks, stream tool sessions and why are they sometimes so different?

Tom Miller:
All right. So this is super common question. And this comes up sort of client side a lot. Right. Why is it that when you're investing in digital media and your digital media venues are reporting to you a certain number of clicks and you're paying a lot of cases for those clicks? Right. And why is there a discrepancy between that metric, the click metric and then your clickstream analytics tool, say, Google Analytics or Adobe Analytics or whatever? You're using the number of visits or sessions that you're receiving that the tool is reporting that you're receiving. Right. So, you know, I'll turn the question to you. Why do you why do you think this could happen?

Jason Rose:
I mean, just any reason I mean, maybe something to do with how the tags are firing, that maybe why they're putting the browser interacts differently based on a tool that's measuring it.

Tom Miller:
I mean, definitely that. Right. So, you know, you're clickstream analytics tool is client dependent.

Jason Rose:
Right. So that means that your browser is actually executing that tag. The click tracking is going to be venue dependent. So as soon as you click that link, the.

Tom Miller:
You know, the. Then use.

Tom Miller:
Tracking is going to kick in. So the way that usually works is you're clicking an internal link on the venue, which is then redirecting to your landing page. Right. The link that you click is what is tracking that actual click as a metric. Right. I just quote it for everyone.

Jason Rose:
Great. Yeah, amazing. There are some other reasons. So I could. Sure.

Tom Miller:
I mean, you know, the ad blocking is a big thing, right? So you could and obviously, if you have an ad blocker on it, be strange to see and it hasn't. But you could be in a situation where you're opting out of, say, clickstream analytics tool, but not opting out to and not opting out from whatever the ad is that you're clicking on. Right. And so you click your visit goes silent within your clickstream analytics tool, but you're still you're still being tracked as a clicker through the advent. You know, another way that that could happen is the clickstream analytics tools do post. Right. So your browser makes a request to Google Analytics and then Google Analytics actually evaluates that request. And we don't just have to be talking about that. But, you know, that request gets evaluated. It gets geotagged to your IP address and there is some post-processing that goes on. And some of that post processing could involve your actual initial page view or your actual session being excluded for some reason. Maybe the maybe the tools things that you're a bot or spider, maybe the tool. I mean, that's really. But but that could potentially happen, right? You get filtered out. Right. Maybe your profile that you're using your view and Google Analytics, for instance, is filtering you out based on your location or your IP address or some other reason. Right. Obviously, that's a tool configuration error that you're making in evaluating your ad venue. But that could certainly be the case. Why else? You know, there are also a lot of instances where people are clicking, landing and escaping or clicking, landing and leaving in a never that the page isn't loading.

Tom Miller:
Right. So if your page doesn't load on your landing page, JavaScript never fires. Well, you've certainly clicked. But if there's an error or there's a long period of time between, you know, the click and the page load, well, you could lose people, right? You can lose people because there's an error. And JavaScript never fires because JavaScript dependent. Most clickstream tools are dependent on JavaScript or it just simply takes too long to load or or the the add the tracking tag takes too long to load and somebody leaves the site, leaves your landing page before that tag get a chance to gets a chance to fire. Now you know, most tracking tags are fired asynchronously these days. So, you know, typically when you land and if you're using a tag manager, right. Tag manager fires, then it a secrecy asynchronously and very quickly makes a request to whatever your click clickstream tool is. And that's sort of happening as a separate thread. But if you're not using asynchronous tags, you're using sort of classic tags. You know, the page needs to load before your tag fire. Right. So you could reach a situation where other JavaScript or, you know, just page load issues are really slowing things down. And, you know, you can have tags creating race conditions with each other where tags are waiting for each other to fire. Things like that. And, yeah, you could just really have a tagging problem if it's something that's persistent.

Jason Rose:
So if I'm a brand and I'm saying this discrepancy between clicks and what my clicks turn is reporting to make sure what I'm not I'm not afraid of this. But what do you do want out of the way towards do you believe want to be more accurate than the other? I mean, what kind of is the next line of thought? Once you realize that there's reasons why the success. But what do you trust?

Tom Miller:
Honestly, you need your you need your you need to use the clicks as an indicator, as an evaluative metric on the media. But really, the way that you need to value your media is from some type of conversion event.

Jason Rose:
Right. So cost per click, cost perception. You know, those are not those are not so good media metrics. What you really want is for our return on ad spend metrics, you know, related to either direct response or long term value of your customers. So, you know, at the end of the day, in some of these men, some of these venues are going to be more accurate than others or seem to be more accurate than other. But really, what you need to do is you need to land on what is our definition of success? And what is that conversion of it? And then evaluate the cost of your media based on that. Right. And you return on your ad spend based on ultimately what it's doing for you and your business and stressing about clicks mean other other ways that this gap is, you know, you could just rapidly click twice so you can click back, click. And yeah, I mean, there are a million ways that this could happen. That's good.

Jason Rose:
Don't get too vain about what the numbers are. Slightly change. But yeah, don't stress off clicks.

Tom Miller:
Understand that you're buying sessions. Right. And you know, if you're buying it and whatever your whatever your campaign objective is, stay focused on that and stressing about these discrepancies. I mean, no no tools going to be 100 percent accurate. Right. If you feel like there is a major problem with your clickstream analytics tool, maybe we can cover that in a future episode. Some techniques for auditing that. But, you know, understand that when you're talking about different platforms in any case. Right. Your testing platform is probably always going to say use a different number of users to maybe your site and you're clickstream tool, your voice. A customer platform might have a different number of uniques at it, see? And it really has to do with defining who humans are that are viewing your site. Right. All these tools might have different understandings of that. And you know what it is you're tracking. So the end of the day, what you really need to do is you need to establish a tool of record when it comes to clickstream establishing metrics that are related to return on media. Right. And not just these activities related to media. And, you know, just understand that when you're talking about a server based tracking mechanism on an ad venue versus your JavaScript fired client side depended tracking mechanism as a click through or as a clickstream tool that they're never going to line up.

Jason Rose:
Never, never say it again. Good. That they're never going to happen. Never gonna happen. So that's it. All right. Cool. I have actually one more.

Jason Rose:
Selfish question that I want to submit myself. Oh. So listening to a podcast last night on the drive home, not a podcast. I was listening to a TED talk.

Jason Rose:
His thesis was he compared how Netflix picked House of Cards versus how Amazon picked Alpha House to be there, their TV show that they put on around the same time. And he said that both decisions were involved with data. So for Netflix, they looked at all the shows that were popular or whatever one was watching on their platform and decided that House of Cards was a show they should invest in. While Amazon relied solely on the data, they put like eight shows out for free and tracked how people watch those shows and then crunched through some kind of algorithm or I'm just not sure how exactly processed it. But it came out and said that Alpha House was the show they made. But the comparison we drew was so House of Cards was extremely popular. Alpha House was not. And he said this was because Netflix used data to break out what the problem was. And then the my brain to say this is what makes sense for us to do while Amazon relied too much on data to both frame the problem.

Tom Miller:
But then also to solve it, you know, it's it's a fascinating difference in that the shows are sort of similar in a lot of ways. I mean, they're really different in most ways. But, you know, some aspect of this implies that, like this DC sort of political, you know, thing was going to take off. Right. Like both of these groups seemingly independently landed on, like this idea of a congressman being a center of a TV show and then vastly diverge from their right here. But, you know, it's it's sort of a little bit of a little bit of a pedantic statement, but it really goes to the difference between be data driven and data informed. Right. It's sort of one of those things that's fallen out of favor to say that we're a data driven company because it implies that you are, you know, almost like Skynet in the eyes to going. You know what I mean? Whereas data in form means that you are trying to arm yourself with as much information as possible and using your own. I don't even want to say gut feeling because that's sort of the opposite of being informed. But you're you're using the data to execute on your business strategy, whatever that might be. Right.

Jason Rose:
As opposed to letting the data dictate the strategy, you know, trusting the complex problem solving that we have the ability to do with our brain instead of saying that. Right. I'm just going to use the. Yeah.

Tom Miller:
And you're talking about something that at the end of the day is a work of art and creativity. Right. So at the end of the day, you know, it's certainly subjective.

Jason Rose:
The Alpha House could have gotten Kevin Spacey on it. And that might be the hit show.

Tom Miller:
That's right. And Alpha House isn't a battleship now by any stretch.

Jason Rose:
It just wasn't a huge hit. That house of cards was is the distinguishing thing. But, yeah, certainly by no means not a hit show. Right.

Tom Miller:
So, I mean, that's sort of where I see it, right? It's like this concept where you can't let go, you can't let go. Like, you can't just because you you have a great handle on your data and what you think your customers are telling you. Right. That's that's ultimately at the end of the day, what you're trying to do is you're trying to deliver positive outcomes for your customers. And you can't really let the data, the data needs to enable that. But it also can't get in the way of that. So that's what makes sense.

Jason Rose:
Yeah. Cool. With the TED talk that we were just talking about was from Sebastian, Mauna Kea and its how to use data to make a hit TV show. But I'm going to check it out.

Jason Rose:
All right. Now we can talk about it more next time and you can maybe poke me some more holes and get a little deeper. All right. Well, thanks for listening, guys. You guys are all submit questions to Bitly slash measure direction.

Tom Miller:
We look forward to answering them next time. Hey, what's your Twitter handle?

Jason Rose:
It's at J. T. Rose s j a y t r.o. sc.

Tom Miller:
Awesome. I met t miller t and l. L. R. We appreciate the follows and we really appreciate the some of the questions submissions. Keep them coming in and we will see you next time.