Measured Direction Podcast
#3 - In-Store Bot Spam
In this episode, we discuss Offline Attribution (Thanks @QUIjacqui82!) and have an extended discussion on Bot Spam and Ad Fraud.
Show Links:
Google Adometry
White Ops
#measure Slack
Transcript
Jason Rose:
Hey, guys, welcome to the third episode.
Jason Rose:
Measure Direction, I am Jason Rose, a content strategist here at Digital Services, and I'm joined by the leader of our analytics practice, Tom Miller. What's up, Tom? What's up? Let's get right into the first question. How can you measure the impact of digital marketing efforts to in-store sales and a quantifiable way? And this question comes from an account manager here at Digital Surgeons, Jacquie Montano.
Jason Rose:
She can be found at at KWI Q UI, Jacqui. Eighty two. So. Q I j a c q ui e two.
Tom Miller:
Wow. Great question. And from Jackie with a horrible Twitter handle.
Jason Rose:
Krieg's equity. No, but it really is a great question.
Tom Miller:
You know, this is sort of a bit of a longstanding question that's been around since sort of the beginning of digital marketing. Right. It's it's what is what is really the lift. And if you're not an e-commerce company, but you are selling a product. What are the effects of digital on in-store? So let's talk about, you know, plug and play is the easiest way to go here. There is a lot of technology, a lot of emerging technology that helps bridge that online to Off-line. Yep. There are panels now that use your phone's location. So one of the great enabling technologies for getting more of this shopper information is people have their phones on it. And so there are panels now that you can opt into.
Tom Miller:
So you install an app on your phone. And that app has locational awareness. And when you go into certain types of places, it will actually track you kind of like how a Nielsen box would track what you look at on the television.
Jason Rose:
It's kind of cool when I see that Amazon actually wants to. I mean, this was just a rumor thing. I was reading an article, but Amazon's premiere, they're both stars and they want to set it up so that people can walk into their bookstore, pick a book off the shelf and just walk out of the store, not only to talk to the cashier, everything just almost becomes automated through the app or the panel or whatever they have to remember. But along the same lines.
Tom Miller:
Exactly. And think of how much easier it is for Amazon's digital marketers to do an attribution model based on that, because they know they've targeted you with it. AD Right. They know what ad it is getting you to the store, what ad is actually getting you to purchase, et cetera, et cetera.
Tom Miller:
And they can probably predict even before you walk into that bookstore, based on what you voted on the site where you've kind of linger and what your interests are when you walk into that store and they pretty much almost just point you straight out or they know you're gonna end up.
Jason Rose:
So I have another question for you. Do you shop at CBS? I do. I actually that was my first job was working at CBS. So you have an extra care card. Yeah.
Tom Miller:
So, you know, CBS in, you know, not to single them out, but they do an amazing job with their loyalty program. They the extra care card, they're extra bucks, et cetera. Their coupons. There's a very strong online to offline conversion component of how they're marketing online. Right. So I'm an extra care card holder. I can order on CBS dot com. The big e-commerce platform. I've personally never used it. Right. But what I do use is they send me coupons to my email box. I actually have to click in to CBS dot com to associate that coupon to my card. And then I'll go into the store and redeem it. And, you know, that's a very, very specific way of using a loyalty program or a store card to target me, to get me to come into the store and then understand my shopping dynamic because I'm very strongly incentivized to associate my purchases at CBS to my profile that they have on me.
Jason Rose:
I mean, this was a decade ago, I guess, that I worked there, but that was the biggest thing they drove us from day one. Day one, the manager, if they heard or not, the first thing you ask when a customer comes up. Do you have an extra credit card? We didn't say that that was like tantamount to the worst thing you could do being employed there. So, I mean, they were early that party as far as really understand the importance of the right.
Tom Miller:
And at that, you know that the value of that data to them in a lot of cases aggregated is probably way more than the discounts that they're offering. Right. Those extra care bucks that you get. You get five extra care bucks, you know, the marketing data within those transactions is probably way more worth way more to them than that.
Jason Rose:
What would you. I mean, this is a tough question to just throw. But what would you estimate the value actually to be? So five dollars of extra bucks are going out. What do you think the data's worth?
Tom Miller:
Oh, man. Well, that is a really tough question. All right. So let's break it down. So, you know.
Tom Miller:
Let's say we could. I mean, CBS is operating on the small margins, not micro margin. It's not like a grocery store. They're not trying to achieve a five percent margin on cost of goods sold. Right. They're probably achee. They're probably trying to achieve more like a seven percent margin on cost. It's all right. This is a number we could probably look up.
Tom Miller:
So I hope I'm not sound like an idiot.
Tom Miller:
But regardless, you know, what you're trying to say is, OK, what their primary metric is probably for their digital marketing team that isn't involved with the. Right. Is how can we get people in store. Right. Because they know that if you go to CBS, you're not just buying one thing. Right. And you're not just you're going in and there's a cost per transaction or value per transaction. That is a long term value. And that long term value can be calculated based on your lifetime, but probably more like based on a forward looking, backwards looking of 12 months. Right. And so what they're trying to do is raise those values up because they know that at a certain threshold you become a much more profitable customer to them because they're spending less. Actually, marketing to you, that's a classic.
Jason Rose:
Like, it's much cheaper to keep a customer than get a new one.
Tom Miller:
Exactly. Exactly. So there's that. And then, you know, the other the other option, I guess, would be store cards. Right. Store cards in a very similar mechanism as loyalty programs. But, you know, if you use a store credit card, like a target card, for example, Target's able to much better track what you're purchasing, how you're purchasing it, particularly within Target. Right. It's like there's no barrier there to what they can look at. So, I mean, let's say you you keep you don't have you know, you're a smaller company. You don't have these resources to, you know, have a even have a loyalty program. Right. Although I would argue that really any small brick and mortar shop sort of ought to have a little program just for the purpose of understanding your customer a little bit better. Right. And marketing to them a little bit better. You know, what else can we look at?
Jason Rose:
Well, actually, before we even get further off this, where you second brought up, Target reminded me of something to remember a couple of years ago when Target got in all that trouble. And there's controversy over the. They were predicting when women were pregnant. Sure. And that was I mean, those that don't know some more or less what the story was.
Jason Rose:
His target came up with an algorithm that women who are a couple of months pregnant, six months pregnant, nine months pregnant, all by certain products. So from in-store purchases, they were able to figure out women that were pregnant and send them catalogs that had a lot of things in it that women that are pregnant by oh, I wait list.
Tom Miller:
It was way crazier and crazier.
Tom Miller:
What they did is they looked at purchase behavior and then they looked at demographic signals and they basically created this predictive algorithm that said. You are likely to be pregnant based on based on your purchases, right? But based on also like where you live, how old you were. Yeah. It was scary, accurate. And it was so scary. I mean, I can't speak to the actual accuracy of it, but the anecdotal evidence is that there were women that were pregnant but did not realize that they were pregnant. That got flyers in the mail that said, hey, here's all this baby stuff for you. And they were like, why is Target sending me this? And then PSA five up in a matter of weeks? Oh, I'm actually pregnant, right.
Jason Rose:
I remember the big headline on Forbes was A Target Tells Girls Girl's father, a 16 year old girls while she's pregnant before he knows it.
Tom Miller:
Yeah, that's like one of the legends or that one of the anecdotes as well.
Jason Rose:
Is this the, you know, imagine having your 16 year old daughter get this flyer and then she actually winds up being pregnant and you find out basically because of this flyer. It's yeah, it's kind of I mean, it's kind of awesome.
Jason Rose:
How does it's also kind of scary. That's where I was. There's a couple of stories kind of, you know, balance that as a theta person. Do you find the crossing a line or do you think that's just really good marketing?
Jason Rose:
You know, I think it is. It is. It's sort of the line is more of a cultural potential taboo and sort of a bad press than a poor marketing decision. Right.
Tom Miller:
So if I could if I can figure out that you as a consumer are going to based on some of your demographic and your purchase behavior or any other type of behavior. Right. If I can predict that, you are going to be highly likely to want to purchase tickets to the U.S. Open in Flushing in June. And I put an ad in front of you that gives you an offer, either gives you a compelling offer to buy that ticket. You're actually going to be pleased with that experience, right? I mean, I think part of the magic of using predictive analytics and advertising is it's not really an interruptive process. You. The reason why we have banner blindness online and why most digital advertising doesn't work is because it's not really it's not something that's relevant to our lives. Right. When was the last time you saw a banner ad or something that was even remotely helmet? Anything I can think of it. And I can think of it as if I go I've gone on, you know, analytics, technology, vendor sites and been reading to. Right. So I've been on Tablo dot com and I see Tableau remarketing. I did in the tablet every marketing campaign. And that makes a lot of sense. Right. But, you know, beyond that, the display advertising that I see is, by and large, garbage. And we're going to talk about this flyhalf, especially if next question, I think. But yeah.
Tom Miller:
So I think that's that's pretty problematic. But that's not you know, that's sort of tangential to the question.
Jason Rose:
All right. So they want to do anything you want to say. Finish up the first question.
Tom Miller:
Oh, no, I don't. I can go on. So and also, you know, we sort of talk about customer data, customer purchase data, if you have access to it.
Tom Miller:
Obviously, that makes things really easy. The other thing that you can look at is if you are a retailer and you have a store locator function on your site. Right. Or you have a couponing function on your site. The that usage and understanding the dynamics of that usage versus who's actually going in the store at a macro level is is a great indicator, right. So. You seek out those behaviors that signal purchase intent, right, or visit store, visit intent. And then you can use that as a proxy for purchase behavior. Right. So, you know, you can't optimize your advertising to driving people into the store just because there is there is no direct data link. But you can optimize your advertising very easily to purchase intent, signaling behavior.
Jason Rose:
Makes sense. Yes. Is there any kind of potential risk there? I mean, there are always strong metrics or sometimes not so much.
Tom Miller:
Yeah, I mean, the risk is that there's no direct tie. Right. So, you know, does a store locator mean that you're going to go to the store? I mean, I think that. It's it's a pretty wonderful KPI to to be driving people to store locators. It's not necessarily a thing that happens. I mean, there are a whole lot of context to using a store locator, right? You could be in your car or about to leave somewhere and you want to visit, say, a McDonald's. That's on their way. I don't see that example because they're everywhere. You want to visit the Chick fil A in Wallingford.
Tom Miller:
You want to visit Chick fil A on your way home. You type chick fillet into your phone and you know that it's in Wallingford. So that's an in the moment. I want to go to this place. Let's go. You might be sitting at your desk at work and you might be curious about where there's an auto parts store near your house. Right. And you might use advance our parts. I'm dropping all these. But it's easier to talk about specific brands. So you might use advanced auto parts store locator. In that case, but you might not be actually going to the store that day. Might be a weekend thing or something like that. So it's you know, again, it's not perfect, but it's it's pretty great and really understanding. And you can do this with the voice. Customer research, understanding the contexts of the usage of the store locator is pretty insightful as well.
Jason Rose:
Cool. Yeah.
Jason Rose:
Is there any way to really actually meet the customer at that exact point of sale?
Tom Miller:
Yes. So, yeah, sure. I mean, you're saying understand the online dynamic at the point of sale. So, you know, a lot of companies do this, small and large companies, they will actually do a either randomized or every single customer sort of survey upon purchase. You know, where are you coming from? How did you find out about us? Have you seen our Web site? Can we get your email? Right. And and that sort of begins, you know, that's sort of building a lattice work of data between the online and the offline. That's something that happens when, you know, you have the customer in front of you and you're able to, in a very deliberate way, ask them something relevant to themselves. And you can also do that online. Right. I mean, so you want you you really ought to be performing for the customer research on your Web site to understand those types of dynamics where your customer analytics methods that you want to discuss before you wrap up.
Tom Miller:
You know, I mean, the point of sale, you know, there's the bed, bath and Beyond method to. Right. So rather than saying asking very specific question, you ask one question to everybody, which is what is your zip code? And that it always annoys the crap out of me, almost the explicit. But what you can do with that is understanding your medium next so you can do media mix analysis really well. In that case, then you can say, OK, we're up in digital. I mean, it's, you know, digital and traditional media. When you have all that segmented by zip and you know your you have a customer sample by zip and you know what people are buying, it really makes it much easier to do factorial analysis on your medium X, right. So you could say, OK, we're up heavy digital in these zip codes. We're, you know, heavy TV in these zip codes. Let's see what the differences are of in-store.
Jason Rose:
Maybe the zip code almost becomes like the control and the experiment.
Jason Rose:
It's like the key for everybody, right? Yeah.
Tom Miller:
And, you know, the other thing you can do is you can do like media mix analysis in similar regions. Right. And this is sort of more medium X question. But, you know, there's certain cities that have very similar cultural and demographic characteristics. And so you can do media mix testing or even hold out testing between those cities. So, for instance, Pittsburgh and Cleveland are pretty similar in a lot of ways. Buffalo and Rochester, give me some other ones. Nashville and Louisville. Right. It's like they're very similar cities. It culturally in size and demographics. And you can do some experimentation on them to really understand what media is working and what mix, considering them as almost equals.
Jason Rose:
Right. All right. Sounds cool. All right. So ready for the next one?
Tom Miller:
Yeah, I guess I'm busy straight from top. What do you want us. Do you want to give our intro?
Jason Rose:
Yeah, I guess we'll do the intro now. So my name is Jason Rose. I'm a content strategist here at Digital Surge. And this is Measure Direction Analytics. Audience driven podcasts that we do with our leader of our Alex Practice, Tom Miller. So I've gone every week, you know. Bitterly, bitterly slash measure direction where people can submit questions and we do our best to answer them, or I should say Tom doesn't, I pick his brain a little bit. Thank you. Of digital surgeons, where a design and innovation firm in New Haven, Connecticut, doing some really exciting, innovative marketing work that I urge you to go to our Web site. Check it out.
Tom Miller:
Awesome. Yeah, a, I would second be encouragement of anyone listening. If you want to hear us discuss anything related. Analytics. Customer analytics. Digital analytics. Digital marketing in general. Please submit a question. I mean, just this podcast is for our users and really by our users. It's Bitly by t dot l y slash measured direction.
Jason Rose:
Also, I think we should plug as your average power our Soke on Libya.
Tom Miller:
I think it goes live next week.
Jason Rose:
So probably by the time this is all produced and published and next by our technologist Adam Chambers, we should give him a shout out. Rowe is doing a great job mixing these and you'll hear his audio bumper's from Camden Sound, a duo that he's set at the front and back at this episode. But either way, yeah. Check out Tom on the analytics power, our digital analytics power hour.
Tom Miller:
You can find them on iTunes or just search digital analytics. Power Hour. It's my favorite podcast.
Tom Miller:
Think about Ballan.
Tom Miller:
It's my second favorite podcast behind the Adventures.
Jason Rose:
That's where I go and sort of thing. All right, great. So let's go to the second question here. What are the impact of bot spam?
Jason Rose:
Yeah, this is the list.
Tom Miller:
I guess this is sort of a poorly worded, anonymously sent question, but the question reads now do a little editorial on it. What are the impacts of bot spam in my analytics reports? And, you know, this is one of those things where this is another one of these questions where I think we're going to talk a little bit about a specific product and then we're going to talk sort of more about a macro level problem. But it's really an ambiguous question. So, you know, I'm going to take this opportunity to talk Beausoleil about it. But, yeah, I mean, I think there's two issues, right? There's this this thing going on. And it's been something that's been happening for. I don't even know a couple of years at least, where people and really hackers, even if that's the right word, annoying spammers is the best word. They're taking advantage and abusing the Google analytics measurement protocol. And what that allows you to do is to pass data to Google Analytics programmatically. And so what you'll see and if you own any Google Analytics accounts you've probably seen this is you'll see fake events being fired into your account. Fake page views or page views that look like domains being fired into your account. And it's typically for MCO link building services or traffic services. What have you. And it's it's all bogus, right? It's all totally sketchy organizations you would never want to do business with. You know, what they're doing is they're just writing scripts. And you could write a Python script to do this very easily.
Jason Rose:
And they're just literally firing requests to Google programmatically iterating through all of the potential user agent account numbers that exist and hoping that, you know, out of the millions of annoying spam particles they're putting into the atmosphere, one or two of them will land them a client. Right.
Tom Miller:
That's the same idea as Seattle spam in your e-mail. Yes. It's it's not you or I that's making that making that happen. It's the occasional sucker that actually clicks the link and buy Cialis from like a Russian Cialis.
Jason Rose:
Right.
Jason Rose:
So what's the impact of all this spam traffic?
Tom Miller:
I've gotten a chance to work with some companies on this in the past few years, and sometimes it's really devastating to your ability to generate insight from analytics. Or maybe not. It doesn't stop you. Right. It just makes it much more difficult. And if you're dealing with, you know, a large shared Google Analytics account, it sort of creates a data governance issue where your definition of what is true in real traffic isn't the same as is defined by the product itself. Right. What it seems you know, I've seen some pretty large publicly traded CPG companies in their main Web sites, main Google Analytics profile, have somewhere on the order of 30 to 40 percent of their traffic that they're seeing or what they think they're seeing is actually not real traffic. It's it's measurement, protocol, abuse. Right. So it's a problem. Mean it's it's really misleading when you're looking at your marketing goals. Right. And they really don't mean anything because everything is messed up. Your conversion rates are messed up. Your the way that your events are being tracked is messed up. And it's it's a problem.
Jason Rose:
So what can stakeholders do to keep, you know, whether it's in their ad tech agency or just digital agency in general is underway for, you know, brand X? How do I keep my agencies, digital agencies honest about this, that they're KPI is they're referring to me or they're accurate?
Tom Miller:
Well, I mean, you can't fall asleep at the switch, right? You you have to understand, I mean, this this problem isn't that big of a problem because it's by and large a technical problem. And if you properly implement your tags and your filters within good analytics, then it's not a problem.
Jason Rose:
So how do you do that? Where you go about actually.
Tom Miller:
Sure. We can talk about that. So, you know, I use filters generally. So filter on the refer. You know, a lot of places are tracking blacklists of refers. I use regular expressions to filter them out of profiles.
Tom Miller:
One of the things that that we've sort of noticed within the Dejoy Linux community is that a lot of these spam, a lot of the spam data is being sent to a primary property.
Tom Miller:
So a dash of one of your user account. So setting up your primary or at least your first primary as Dasha to there is some evidence I use a what is d'Arvor even use? I want to spam. The most powerful thing you can do is wait list your property domain. So basically set up filters that that require. Your domains are the only ones included in the main view.
Tom Miller:
You know, a lot of these spam robots that are doing this are not bothering or even not even bothering to understand that your Web site, w w dot your Web site, dot com, should really be the only domain that is serving this tag to its users right now. So Google gives you pretty, pretty good work around there with a filter. And then, you know, one of the most powerful things you can do is geo limit your requests to to that property. So really. 100 percent. The time I start and I exclude Russia, the Philippines and Indonesia.
Tom Miller:
You know, it's it's it's a business question for you. If you're a U.S. Atley business, all of your customers are really only in the US and all of your potential and current and future analytics reporting is going to be on. People that are visiting your site from the U.S. House set up a filter that just says, you know, including the United States. And that will eliminate a lot of the problem because most of these most of these requests are coming from outside the United States, mostly from Russia. Yeah. Yeah.
Jason Rose:
And of course, since I just took the certification. Yes. Do feel like I have the same. Of course you have to maintain one filter. Right. And that kind of just that's a very core.
Tom Miller:
Yes. You're always, you know, best practice. That is a best practice, of course. Yes. So you always have a raw look at your data.
Jason Rose:
You know, the extremely good, good practice.
Jason Rose:
So what was the second issue that you raised before us?
Tom Miller:
Sure. So, you know, when I think about bot traffic, it's sort of a topic I'm a little obsessed with and have been for a few years is what I think about is ad fraud bots. The the concept of headless browsers viewing pages across the Internet in, say, viewings, they're not humans. Right. But headless browsers requesting pages across the Internet as part of a giant global organized crime rate, which is which is what it is. So I went to see a guy named Michael Tiffiny, and he's a co-founder of a company called White Ops speak at a conference a few years ago. And it sort of opened my eyes to this world of, you know, how sort of this hijacked computer capacity that has come because of these viruses that are on your parents and your grandparents computers in living room. Right. You know how that is actually being monetized, right. Because I come from the 90s. Right. And I sort of got into this industry in the 90s and back in the 90s, computer viruses were a major problem and they were a problem because that was a vector for sending out spam e-mails. Right. So your computer will get infected and then it would just. You know, the controller of your computer would just use some part of your computer's capacity and just spam, spam, spam, spam. Right. That was a main way that they they being the criminals that were the viruses and took every computer, made money. And what has happened and this is sort of happened. I don't know, like since the early, early aughts, is that there's been this really dramatic shift to create these headless browser programs that look and act like real people, but are visiting sites that are set up by these crime syndicates to basically serve banner ads to these fake people. And then they they make money on that service. So they're like these these shady fake publishers that are monetizing fake traffic that they themselves are creating a traffic tech company or whatever, and or they're selling that traffic to a company.
Tom Miller:
Right. So, you know, it's a it's a it's a major, major problem. And white ops, I mean, they're a company. So they're incentivized to to make this seem like a bigger deal than it might be. But they they claim that bot fraud will be a seven billion dollar problem for marketers in 2016.
Tom Miller:
Seven billion with a B? Yeah. Some writing. And, you know, I've seen stats thrown around. And I, I, I actually believe a lot of these stats that 50 percent of web traffic is actually not human traffic. Right. And part of it, you know, part of that traffic also are what I would call benevolent bots. So, you know, you have search indexers. You have job listing indexers. Right. You have all of these companies that are crawling the Web in a friendly and mindful way in taking the data from Web pages and indexing it in some way. That's useful. Right. So Google's the biggest example of it. They go on their Web site, they indexed your Web site into their search, and then you can search for things on the Web and find it. Thanks to Google doing that work. Right. You know, you think about like job sites, like indeed does the same thing. Right. So indeed is out there and they're crawling Web sites looking for job listings, transferring them into their format, and then they're aggregating them altogether.
Tom Miller:
So the point being that a headless browser isn't necessarily a evil thing, but there's a lot of not at all. There's not a very strong uses for them. They're absolutely Sosebee. It's just correct. Using using it for good, not evil.
Tom Miller:
So so what has happened over the last 15 years or so is that these headless browsers are now able to execute JavaScript. Right. And so what that is creating havoc with our analytics tags, advertising tags and other marketing tags. Right. So what's happening is these headless browsers coming on to sites, they're executing JavaScript. The they're acting completely like a regular, you know, person driven browser. And the technology still hasn't quite caught up to the browser writers ability to make a convincing argument that this is not actually not a person right now. And part of the way that they do that and part of why this is such a major and insidious problem is they actually mimic real people's search habits. And that means that every Web site, some percentage of its traffic is actually not a human being. Right. Because they're they're trying to make things look real. So they're following links. They're occasionally clicking banner ads. Right.
Tom Miller:
And they're landing on. All these sites. Right. So if you think about a company like Outbrain, for example. Right. Outbrain Pang's. These ads, like on the bottom of a lot of content pages that are recommended, articles say, and they're getting clicked by robots. Right. And they're getting hit by people, too. But that's sort of you can imagine if I'm a robot and I'm trying to monitise better advertising and on some site, I'm not just gonna go to that site 50000 times. What I'm gonna do is I'm going to surf around until I find that site. Right. Using an Outbrain link. Click to this. Click to the site. Register the whatever it is point of one cent for the for the bob. For the broader. For the organized crime syndicate. And then just rinse and repeat that across fifty thousand different computers. Right. That that have been fairest.
Jason Rose:
So in some way they're making some real money. Yeah.
Tom Miller:
They're making about seven billion dollars. Right. I mean, you.
Tom Miller:
And it's like, you know, it's all well and good and I guess some of the criticisms of the pushback on this is that a lot of digital marketers say, I've read, you know, really eloquent arguments that we shouldn't be worried about this, because what's happening is if I go and buy banner ads, right, it most of the time I'm buying better ads in an e-commerce situation. Right. I'm trying to get people to discover my product and buy it online. And I don't really care if 50 percent of the people that view my banners are human or not, as long as I'm seeing a good return on ad spend for my specked. In other words, if I'm marketing to 500 people and 500 bots, but 50 of those people actually come on my site and complete a transaction, I don't really care about the bots.
Jason Rose:
It actually means your marketing. You can get a higher percentage on your right of your porting in a way.
Tom Miller:
But the problem is, is that that seven billion dollars is being used for a lot of other bad things around the world. Right. I mean, period today. Right. Human trafficking.
Tom Miller:
Drug trafficking. And it's a lot of money. Really wrong.
Tom Miller:
Civil wars in poor countries, you know, like that's what's happening with that money. Right. Manhattan real estate now. But so, you know, that's that's that's the problem.
Tom Miller:
And, you know, the it really started because of a combination of bad technology and bad incentive within the what I would call the display advertising industry. And to be fair, in the last two or three years, there's really been a major push. I mean, the major push to what is called view ability, quote unquote, view ability is really a push to eliminate a lot of this, not human traffic. I mean, you know, the Wavin View ability was spun was to say, oh, we want to make sure that people aren't buying ads that appear below the fold. Really what it is, is we want to make sure that people aren't buying ads that are below the fold to humans or just on the page to bots.
Tom Miller:
That's really how how. That's really the truth about that. With that is that pushes.
Jason Rose:
I mean, what are the steps for solving this or I mean, how do you how do you combat it?
Tom Miller:
Let's talk about how this problem came to be. Right. So we have this major incentive problem because we have advertisers, particularly like brand advertisers. One banner inventory. Particularly, you know, sort of as things as the economy rebuilt from the early 2000s. And it became a you know, advertising became me really major and important advertising sort of got Web advertising sort of figured out later of some major technological shifts. There was a lot of consolidation and it became a really good way to spend marketing dollars online, or at least it was thought to be publishers. You know, you have this rise of the publishers. And I think the biggest publisher is probably the one that sticks out in my mind, having lived in DC, was having dinner straight, having to post, sort of came out of nowhere and got bought by AOL, got by by AOL and like, what, seven, six in there for a massive, massive sum of money.
Jason Rose:
Yeah. Yeah. Like a billion. Yes. Which today doesn't seem like money.
Tom Miller:
Yeah. It was it was a really big deal. And, you know, to this day, I still wonder, like, how much of that traffic that, you know. I thought it was buying is actually real people. Right. But, you know, so publishers wanted to sell the inventory. Publishers like, you know, I mean, having this great sample and then you have sort of these intermediaries, the media agencies of the time. And they wanted to you know, they make a cut of the total brand advertising budget. Right. So their incentive is to spend as much budget as possible. And so what you've got is you you have like a messed up supply and demand curve, because what publishers were able to also do at the time, you this rise of traffic, like traffic brokers, basically you could pay a company, you know, five cents for a thousand visitors. Right. And air quotes, which you can't see on the Bogert, but it's the thousand visitors, which, you know, some of which may have been bot's, all of which may have been bot's. Right. They were concerned with supplying traffic to people. And so you sort of had companies selling a product with high demand, but infinite supply. Right. And so publishers had ever wanted to turn contract away. Right. Because I because it was mostly direct deals. There's like programmatic like that. So publishers never wanted to turn a contract away. So what they would do is they would take on any contracts and just buy. Support it. Right.
Tom Miller:
And advertisers wanted their budgets to be spent. Right. So they put pressure on the agencies to fully spend out their budgets, advising the agency one to do and say, hey, you know, we can only spend 18 percent of your meet your budget. That's a major problem, too. Right. So you can see where all these incentives sort of collided to create this ecosystem where fraud was very much able to grow and flourish. Right. And, you know, as publishers got smarter, as agencies got smarter is at, especially as advertisers get smarter. They're really holding the money bag. They push to make traffic more and more chaotic. So that's the push to view ability to push away from non-human traffic. Right. But in the meantime, what has happened is that the bot creators have gotten more and more sophisticated in the way that they're writing the bots to make them pretty much.
Tom Miller:
I mean, you know, you can pretty much write undetectable by this point. And it's unfortunate, but that that is sort of where we are now with the state of of display advertising on the Web. And, you know, I don't want to make any claims that that's how it is for any publisher. Right. But, you know, you also have sort of the rise of these other platforms and the rise of programatic, which is sort of more properly pricing. Right. It's more Real-Time pricing all of the value of traffic. And then you also have players in the industry like Google and Facebook, among a slew of others that have really been extremely in front of this issue and have taken great steps to ensure that their traffic is human. Now, Facebook has an example or an inherent advantage because they are you have to log in to their platform. Right. So publishers where you have to log in have an inherent advantage anyway because a a bot is not going to walk it right or unless they're really strongly incentivized to. Why would they. Right. So in, you know, part of the rise of paywalls like this is a factor in the rise of paywalls, because you know that if you're limiting your viewers to a certain number of articles, you're also limiting bots to a certain number of articles or something. You're upping the overall quality of your of your banner pull. Yeah. I mean, there's there's a lot there's a lot to it. So what do you do about it? I mean, so, you know, as I mentioned, I'm obsessed with it with this issue. And part of the reason why I'm obsessed with this issue is because I've actually helped build a solution to this problem. Right. So a few years ago, I actually wrote a and there wasn't a single piece software.
Tom Miller:
It was a system that basically scored a single browser session based on a number of factors, much like I don't use to have email with spam assassin on it, maybe college where you get like a questionable email.
Tom Miller:
I don't have like a spam assassin score on it. So spam assassin. This is a model I used. It's basically spam assassin has to say it's like two dozen different factors. And as an e-mail administrator, you can wait all of these factors and, you know, some of the factors are like where was the origination server? What how many hops did the e-mail take before it got to your server? Does it have the words the Alice in it? Right. All of these all of these different factors. And basically, if your score reaches reaches a certain threshold, then spam assassin just marks it as junk. Right. And then you can set up rules to within your email system to deal with that however you want. Right. And so I set up a very similar system to do that with browser sessions or the factors.
Tom Miller:
Sure. Yeah. So IP address blocks. Right. So certain. IP addresses, particularly in certain countries, were just automatically spasmodic. Right. I mean, they were just bots. There are also certain proxy servers within the United States that seem to have a much higher likelihood of being not human traffic than others. The biggest one is browser user agent.
Tom Miller:
So, you know, when you when your browser makes a request to any site, it actually sends the user agent, which is your browser, basically your browsers fingerprint. I mean, that's not the right way to say it. It's it's the name of your browser, the version of your browser. So in the case of your computer, it might be Chrome. Twenty one. It's just a big string of characters that identifies your browser and sometimes it identifies the plugins that your browser has, things like that. You know, that's actually a super easy one because the libraries that are used to write bots and I should talk about that as well. So there's this new site and really that new. But there's this server software that has been written in JavaScript. Right. It's called the node. And we use node. I mean, it's it's very in the TS node. And it's just basically allows you to do all your server side scripting in JavaScript. And when Node sort of keened came to be it read. It's really an enabling technology for these bots because you're able to write your your fake headless browser in node in use JavaScript almost natively.
Tom Miller:
Right. So you're not doing Python to JavaScript or Perl, the JavaScript or whatever else you're writing these bots in. You're really writing JavaScript to execute JavaScript. That makes things much, much simpler. So all of these different libraries of code that you could you could write a crawler wrath, right? That's essentially what you're writing is a web crawler. They all have their own user agents. Right. So it's always a really easy red flag that if someone's using a particular library and it shows up in their user agent string, well, they're they're both and they're not human. Right. So so that's an easy one. There are also different types of user agent strings that you can you can score in different ways that might be slightly more likely or might be slightly less likely to be a robot. Right. And then the other one that's big is behavior. So, you know, we looked at behavior in a number of ways. First was just impossible behavior. So we would look and see. OK. Same user agent from the same IP address. They're making 500 page requests a minute. They're not a human right there. They're a bot. And some of the bots are really dumb and some of them are really smart and the dumb ones are really easy to catch. Right. The smart ones are impossible to catch.
Jason Rose:
What percentage of what you say are the quote unquote dumb bots like this?
Tom Miller:
I have no idea because I have no idea. I have no idea what percentage of the smart ones aren't people. That's right. Absolutely no idea. So it said that there's another tech you can use for behavior. We use the honeypot project. And so at the Honeypot Project is it's this giant collective sort of data organization that gives you a little piece of code.
Tom Miller:
When you join it, it gives you a little piece of code and you put it on your site and it basically just creates a link that is invisible to the browser.
Tom Miller:
Right. And, um, uncollectable to a browser. And so what happens is the bots are going through all the over age and they click it. And then all of a sudden that IP address excuse me, within the honeypot project gets logged. And then what you can do is you can actually make a request to the honeypot project in real time and get a score for that particular IP address from there. So that's a huge factor.
Jason Rose:
It's almost like a liquor store, like hang up the fake I.D., that flood wall. Yeah, it's like a version of.
Tom Miller:
Yeah.
Tom Miller:
And so what we did for this particular publisher is we said, OK, if a score reaches a certain threshold, don't stop the session. Right. This is an application level system. So everything was occurring sort of before the page was rendered. Right. Particularly on that first page view was actually a little bit of work going on behind the scenes. But what we said is, rather than what we could have done and we did in a few cases for a few types of bots, as we said, kill a kill a page like you try to argue that you are out and you see nothing. You see a blank screen.
Tom Miller:
There's no return right in. What I would say is probably 90 as far as traffic volume goes. It's ninety nine point nine nine percent of traffic. What we did is we said don't fire any analytics tags, don't fire any any advertising tax.
Tom Miller:
And so what that does is it basically from a user experience standpoint is great. You know better. Right. You don't get you don't get tracked at all. And what we did there is we were just hypersensitive to our users needs. And we were hyper nervous about false positives. Right. And so the way that we mitigated that, as we just said, hey, you know, no harm, no foul. We're not tracking you. We're not making a money off you. You look suspicious. But what we're also doing is we're not polluting our publisher's inventory anymore. It's just really a way of erring on the side of caution. Yeah. Correct. And, you know, I you know, it's it's an interesting problem. And it's it's sort of it really it was, you know, a few years ago, it was a really fun sort of process to build. And it's not a product, but it was very interesting to sort of get into it.
Tom Miller:
The other thing the other thing is you do user behavioral signals, which I should mention by logging in. Right. So all of a sudden, you know, if you're if you're scored high and then you log in, it sends a signal back in. And there are these feedback loops where we would say, hey, you know, our thinking on this particular IP address, block or user agent is not right because we know that the bar was overwhelmingly likely to not be OK again, or it just like the most clever Botten world, in which case have you.
Jason Rose:
You don't have to like it's like what you said before, you know, the really, really good. But. Right. Well, can you never be able to tell.
Tom Miller:
And there's other technology out there that we could use. I mean, there are obviously systems that you can buy. Right. So white ops provide some of these systems. I give them props because they first sort of turn me on to this whole world. But there are other other vendors out there that can buy systems either on the ad serving side or, you know, on your application side.
Tom Miller:
Yeah. So it's it's pretty cool. Then other things you can do or you can set up like a captcha. Right. So Google now has with their reCAPTCHA, almost an automatic captcha. You can you can leverage that in some way or just force people on your Web site to click the box that says I'm a human. It's a little it's it's sort of the user experience nightmare. Right. And it's also like, why would we train people to go through the capture process just to serve them ads and track them? But that's not really set of data. They're not really incentivized to do that.
Jason Rose:
I've even seen captures that. It's like it takes you a couple of tries as a human to be able to read. Right. You know, it's like like you said, terrible. You.
Tom Miller:
Right. And, you know, if if someone were building this and building it even further, what I would do is I would do things like inbound links. So setup inbound links, particularly from email, in a way that signals like I'm very likely to be a person because I'm clicked on this link from this email, and that link would only be available to that person's email address. And you know what I mean?
Tom Miller:
Like, you could do it in a way where that would be a very strong human signal. Fortunately, you know, it seems like the industry in the past few years, the economic incentives to for the BOP producers are far less. But it's it's one of these things that's going to be around for the rest of the Internet. I mean, it's it's there's always going to be some incentive, just like there's some incentive to send out Cialis emails still to try to monetize this. You know, basically, you can just call it server capacity, but it's virus late computer capacity in some way. And this is, you know, in the past in the past fifteen years has been a very effective way of monetizing it.
Jason Rose:
And I mean, you talked a lot about the late 2000s. It really became most insidious then one. It's not the thing that shows up on your computer. It completely locks it up. And so when that's happened. Right. It's very hard. I can't use my computer to more of these viruses are right probably now. And no one bitches about viruses as much as they used to. But the problem is just as if not more insidious because it was invisible, right?
Tom Miller:
Yeah. That's that's what I mean. It's it's clever, right? I mean, like that the the technology to play here is clever. And again, it's not it's not super difficult to write a botnet is that is pretty good.
Tom Miller:
It's difficult to write a bot that is very good because, you know, there's there's additional technology out there related to scrolled death tracking, mouse location tracking. Right. That is baked into your browser. The browser technology is getting better at sort of thwarting this behavior. But it's you know, again, it's always gonna be their name. But he can write about. Right. Like, I wrote a bot, too, last year for my fantasy baseball league, too. Every single round of our draft log in as me to our baseball provider.
Jason Rose:
And I don't see this going to get back, but log in.
Tom Miller:
Me to our baseball provider and just snapshot the draft and then upload it to my fantasy baseball application. Twenty seconds or something, right. But that's not. It was serving me banner ads. Right.
Tom Miller:
I had to turn on JavaScript in my bot because the logging process was JavaScript based and I needed to have that functionality that could keep functionality which is not dependent on JavaScript. Depending on the browser. But it was I'm sure I violated the terms service in that week.
Jason Rose:
You don't mean Kathy report to the commissioner? I am the commissioner. Now having is the crown.
Tom Miller:
Exactly. Exactly. So, yeah, that's it. I mean, you know. So I, I'm not sure if this is you know, this is sort of just a little sharing of a pretty fun project that I worked on for a while. But I'm not really sure of the, you know, the maintain relevance of it, because now that we're sort of OK, now that I'm on the media side, what I'm most focused on, like some of these other sort of cynical marketers is return on that spend.
Tom Miller:
Right. And, you know, it sort of becomes a question of do I care so much about this bot traffic? And I think it's incumbent upon players like Google who's who's been on Facebook, who've been, you know, absolutely sort of, in my view, pure on this matter to continue to be pure on the matter, to continue to set the standard for everyone else to follow.
Jason Rose:
Yeah, they want to end up on the right side of history as more and more of the shakes out. And we were we were a little part of you understood it was happening.
Tom Miller:
And we're on the right side of history on this and a lot of other Internet governance issues.
Tom Miller:
So it's a fascinating problem, really. Yeah, I think I feel like I talked about it for like an hour.
Jason Rose:
It's worth it. Really, really fascinating stuff. So I guess that's it for the third episode of Measured Direction. And guys, again, the place to submit questions that we can answer in coming weeks is Bitly Dash measured direction. So it's b i. T. Y slash measured direction. What's your Twitter handle? You find me on Twitter at J. T. Rose J a y t rose.
Tom Miller:
I'm at T Miller at t m l l r and I'll see you on the measure of slack as well.
Tom Miller:
Absolutely. On Twitter. Also if you'd like to submit a question there, put the hashtag measure direction. You can find us on SoundCloud and on iTunes.
Tom Miller:
That's right. SoundCloud, dot com slash measure direction, oddly enough.