Explanation of Megaupload Study (or: Econometrics 101)
Sine we found that it actually boosted revenues meaningfully, there are naturally a number of people who don’t like the study and criticize it without even reading the abstract, let alone the paper.
The most common critique in comments on blogs and news articles is that “sales were increasing anyway because of (digital growth) (new digital channels) (blockbusters released in January) (insert your favorite reason you think sales would have grown here).” I suppose people think that as economists we would not have thought of this.
I thought I’d explain the actual methodology of the study below and why it accounts for this… in other words, why you might find it compelling. But I’ll do so by analogy without any econometrics equations.
What A Bad Study Would Look Like:
Imagine you wanted to know the effectiveness of a new medicine in treating the common cold. And imagine you had a good way to measure how bad a patient’s symptoms were each day. If you just took 100 people who had had the cold for 4 days and gave them your pill, and 2 days later they were all very improved, you could hardly release this as a study. People would (rightfully) say “colds usually last about 4 days, your patients all would have been recovering anyway! You guys are hacks (paid for by big pharmaceuticals).” That would be the equivalent of simply asking how sales changed around the world after the Megaupload shutdown. This is what people are claiming we did. It’s also precisely what we say in our half page abstract that we did not do. So this blog post is to provide some more information to people whose response to our paper is “correlation is not causation.”
What Our Study Actually Did:
What you would really like is to split the patients randomly into two groups of 50, give half of them the medicine and give the other half an identical looking sugar pill. Then ask how the “treated” group compares in 2 days to the “control” group that got the sugar pill. That would be a reasonable study. As scientists, we would have loved it if the government randomly picked half the countries in the world to completely block access to Megaupload and left it untouched in the other countries – we could ask how sales changed in the (randomly chosen) blocked countries compared to the unblocked. If a new release came out in January that boosted sales, it would boost sales in both sets of countries so our estimate of the effect of the shutdown would not be biased. Even if you think that the new release came out in some countries but not others (in January), since they were chosen at random it should be coming out in approximately equal numbers of blocked and unblocked countries. That would be a great way to get a good estimate of the impact. The unblocked countries give you an estimate of how sales would have changed if not for the shutdown, and any change in the blocked countries over and above this might be attributed to the shutdown. Like a medical trial with control and treatment groups.
Unfortunately we didn’t have that experiment. But we had something that is similar and equally valid. Imagine that you couldn’t give any of your patients pure sugar pills but you could give some patients pills that were 80% medicine and 20% sugar. And you could give some patients pills that were 60% medicine and 40% sugar. And some patients pills that were 20% medicine and 80% sugar. Imagine that before you gave them the pills, all groups of patients were recovering or not recovering at equal rates. So you have evidence that they are all following about the same recovery track. Then, immediately after you give them the pills, the people who got the 80% medicine pill have the highest amount of recovery. And the people who got the 60% medicine pill have reasonably high (but not as high) recovery. And the people who got the 20% medicine pill have the lowest amount of recovery. Given that the groups were following the same trend before hand you would have expected them to continue to do so, but *immediately* after getting the pill you observed a strong significant positive correlation between” recovery” and “% medicine in the pill”. Would you not think that the most likely explanation for this was that the medicine has a causal effect treating the cold? That’s why we call our correlation a causal impact.
Our situation was analogous. After controlling for various variables (including Christmas), we found that countries with high Megaupload use had similar sales trends to countries with low Megaupload use before the shutdown (the levels of sales were different, but the time trends were the same). Immediately after the shutdown, the sale changes were no longer the same. The sales change was positively correlated with the pre-shutdown amount of Megaupload use. Countries with high pre-shutdown Megaupload adoption had higher sales growth (or less loss) than countries with low adoption. Would you not now say that the most logical explanation for this immediate change from no correlation to a correlation is that the Megaupload shutdown causally affected sales?
Is This 100% Proof?
Of course it is not 100% proven. Perhaps lots of invisible fairies *just happened to appear in January 2012* in countries with high Megaupload use and told consumers to start buying more movies. And some fairies appeared in medium Megaupload countries and told consumers to start buying a few more movies. And no such fairies appeared in low Megaupload countries. But how likely is this counter-explanation? Can you come up with a counter-explanation that is more likely than faeries? If so, please post here – we love exploring alternate theories to see if they could explain our findings or not. We want to know the truth. We just can’t think of any reasonably likely counter-explanations yet. (except the movie fairies!)
This methodology, known as a difference-in-difference technique that exploits treatment intensity, is a very common methodology used in economics and is the basis for many studies in very highly ranked peer-reviewed journals. By the way, when we requested the data from the studios, we did not tell them our methodology. We don’t believe they tampered with the data (or we wouldn’t publish). But even if they hypothetically would have chosen to, they would have had to falsely increase their sales in countries like Spain and France (high Megaupload adoption) while lowering (or not changing) their sales in countries like the US and UK (low Megaupload countries). That’s the only way one could hypothetically fake data to produce our results. Since they did not know what our methodology was, that hardly seems like the manner in which one would tamper with data, right? Lowering or ignoring sales in the US?