Finding out the impact of advertising should be easy: "Did showing my ads affect customers' behaviour, relative to not showing my ads?" We have developed a technology that facilitates this simple comparison by identifying users who would have been exposed. These are the control-group counterparts of the exposed users in a randomised experiment. In this article, we'll discuss advantages of this technology over other experimental methods and its importance for attribution modeling.

May 2015

A recent article here on Think with Google discussed how to measure true marketing impact using controlled experiments. In this article, we dive deeper into the challenge of running online ad experiments that are accurate and precise.

The Challenge of Measuring Ad Effectiveness

Many advertisers think that evaluating the impact of advertising is easy - you just compare the outcomes from one campaign to another, right? Unfortunately, simple or even more complex attribution modeling comparisons don't always give the proper credit to the ads: not all correlations are causal. Experiments involving treatment and control groups represent the scientific gold standard in finding out what works and should play an important part in an advertiser's attribution strategy. So, what is the ideal advertising experiment?

Features of Accurate Ad Experiments

A scientific advertising experiment has two main features: a well-defined target group and precise control over who is allowed to see the ads. An ad campaign never reaches all users within the target audience because whether a user sees a particular ad depends on their behaviour, competitive bidding among advertisers, and numerous targeting parameters. As a result, the users reached can differ radically from those not reached. To measure advertising effectiveness, we want to make a simple comparison, "Did showing the ads change users' behaviour, relative to not showing them?" To do so, we randomly divide our target users into two groups: we try to show ads to one and withhold ads from the other. In scientific speak, this creates a treatment group with actually exposed users and a control group with users that we would have exposed but chose not to. This is the easy comparison of exposed versus would have been exposed users.


Approach 1: PSA/Placeholder Campaigns

Now, you may be thinking, "I know about placebo testing, it's easy. I just split my targeting list in half and set up a second campaign with a placeholder ad that is unrelated to the advertiser - such as an ad for a charity (i.e., public service announcement or PSA)." Under cost per mille (CPM) pricing and a very simple ad server, this can work but comes with a media cost of the placeholder ads to the advertiser or publisher. However, modern ad servers using cost per click (CPC) and action (CPA) pricing and creative optimisation distort these so-called "PSA tests." For example, if the ad network chooses to show ads that generate higher revenue more often, any difference in click-through rate (CTR) between the advertiser's ads and the placeholder ads will cause the creatives with higher CTRs to be shown more often. Further, the ad network will show ads to the types of users that are most likely to click. And the users who choose to click on an ad for sporting goods or apparel are likely to be quite different from those who click on an ad for a charity - leading to a comparison between "apples and oranges." Hence, such PSA testing can lead to wrong results ranging from overly optimistic to falsely negative.


Approach 2: Intent-to-Treat

There is another way to analyse experiments without a placeholder ad campaign: just ignore all exposure information in both treatment and control groups. For example, if we randomly divide our target audience into two lists and show ads to only one, we can compare the behaviour of all users, not just those who saw an ad, between the two lists to measure the causal effect of the advertising. By comparing all users, regardless of their opportunities to see the ads, we still properly compare the exposed with the would have been exposed users but end up clouding the comparison with added noise from those users who were not or would not have been exposed. This method - called "intent-to-treat" - is scientifically clean, comparing apples to apples. But the added noise of including unreached users that were not really part of the experiment is often quite large, leaving us wishing for a better solution.


Ghost Ads: An Ideal Solution for Ad Experiments

The ideal solution combines the best of both worlds: apples-to-apples comparisons on users who were actually reached by the advertiser's campaign. We call this technology "ghost ads." Ghost ads record when we wanted to show the ad, ensuring that control group users who would have been exposed are comparable to the actually exposed treatment group users. By not using two separate creatives, the ad server handles treatment and control users identically even under CPC or CPA pricing, avoiding the distortions that plague PSA tests. And, since it allows filtering out users that did not have a chance to see the ad, ghost ads let us measure undistorted uplift effects with a precision that exceeds that of intent-to-treat by up to 50 times.

Ghost ads have additional benefits. First, neither the advertiser, the ad network, nor the publisher incurs the cost of serving PSAs. Second, with a costless control group, more complex and informative experiments can be designed. Third, rather than comparing the advertiser's ad with an irrelevant artificial ad, the experiment measures against the relevant competitive baseline: would have been exposed users see whatever ads naturally appear when the advertiser turns off the ads. Fourth, Ghost ads that record when the user sees the ad (e.g., ad viewability) help us discard the noise from users who were delivered an ad but did not actually see it.


Looking Ahead

Building the ideal is not easy. As with PSA testing many pitfalls can arise. New technologies such as ad viewability, real-time bidding, retargeting/remarketing, and user personalisation add engineering complexities and require rigorous validation. Google is known for investing in new technologies, and ghost ads is one of them. A case study with DefShop, a European online specialty clothing retailer, demonstrates this new advertising measurement solution.

Ghost ads will help validate and improve existing attribution models and create a revolution in advertising effectiveness measurement. We hope the industry will embrace ghost ad experiments as a key technology in evaluating and optimising ad spending.