Skip to content

Explanation of Megaupload Study (or: Econometrics 101)

March 13, 2013

As I’ve already blogged, Mike Smith and I released a study on the impact of the Megaupload shutdown on digital movie sales and rentals.

Sine we found that it actually boosted revenues meaningfully, there are naturally a number of people who don’t like the study and criticize it without even reading the abstract, let alone the paper.

The most common critique in comments on blogs and news articles is that “sales were increasing anyway because of (digital growth) (new digital channels) (blockbusters released in January) (insert your favorite reason you think sales would have grown here).”  I suppose people think that as economists we would not have thought of this.

I thought I’d explain the actual methodology of the study below and why it accounts for this… in other words, why you might find it compelling.  But I’ll do so by analogy without any econometrics equations.

What A Bad Study Would Look Like:

Imagine you wanted to know the effectiveness of a new medicine in treating the common cold.  And imagine you had a good way to measure how bad a patient’s symptoms were each day.  If you just took 100 people who had had the cold for 4 days and gave them your pill, and 2 days later they were all very improved, you could hardly release this as a study.  People would (rightfully) say “colds usually last about 4 days, your patients all would have been recovering anyway!  You guys are hacks (paid for by big pharmaceuticals).”  That would be the equivalent of simply asking how sales changed around the world after the Megaupload shutdown.  This is what people are claiming we did.  It’s also precisely what we say in our half page abstract that we did not do.  So this blog post is to provide some more information to people whose response to our paper is “correlation is not causation.”

What Our Study Actually Did:

What you would really like is to split the patients randomly into two groups of 50, give half of them the medicine and give the other half an identical looking sugar pill.  Then ask how the “treated” group compares in 2 days to the “control” group that got the sugar pill.  That would be a reasonable study.  As scientists, we would have loved it if the government randomly picked half the countries in the world to completely block access to Megaupload and left it untouched in the other countries – we could ask how sales changed in the (randomly chosen) blocked countries compared to the unblocked.  If a new release came out in January that boosted sales, it would boost sales in both sets of countries so our estimate of the effect of the shutdown would not be biased.  Even if you think that the new release came out in some countries but not others (in January), since they were chosen at random it should be coming out in approximately equal numbers of blocked and unblocked countries.  That would be a great way to get a good estimate of the impact.  The unblocked countries give you an estimate of how sales would have changed if not for the shutdown, and any change in the blocked countries over and above this might be attributed to the shutdown.  Like a medical trial with control and treatment groups.

Unfortunately we didn’t have that experiment.  But we had something that is similar and equally valid.  Imagine that you couldn’t give any of your patients pure sugar pills but you could give some patients pills that were 80% medicine and 20% sugar.  And you could give some patients pills that were 60% medicine and 40% sugar.  And some patients pills that were 20% medicine and 80% sugar.  Imagine that before you gave them the pills, all groups of patients were recovering or not recovering at equal rates.  So you have evidence that they are all following about the same recovery track.  Then, immediately after you give them the pills, the people who got the 80% medicine pill have the highest amount of recovery.  And the people who got the 60% medicine pill have reasonably high (but not as high) recovery.  And the people who got the 20% medicine pill have the lowest amount of recovery.  Given that the groups were following the same trend before hand you would have expected them to continue to do so, but *immediately* after getting the pill you observed a strong significant positive correlation between” recovery” and “% medicine in the pill”.  Would you not think that the most likely explanation for this was that the medicine has a causal effect treating the cold?  That’s why we call our correlation a causal impact.

Our situation was analogous.  After controlling for various variables (including Christmas), we found that countries with high Megaupload use had similar sales trends to countries with low Megaupload use before the shutdown (the levels of sales were different, but the time trends were the same).  Immediately after the shutdown, the sale changes were no longer the same.  The sales change was positively correlated with the pre-shutdown amount of Megaupload use.  Countries with high pre-shutdown Megaupload adoption had higher sales growth (or less loss) than countries with low adoption.  Would you not now say that the most logical explanation for this immediate change from no correlation to a correlation is that the Megaupload shutdown causally affected sales?

Is This 100% Proof?

Of course it is not 100% proven.  Perhaps lots of invisible fairies *just happened to appear in January 2012* in countries with high Megaupload use and told consumers to start buying more movies.  And some fairies appeared in medium Megaupload countries and told consumers to start buying a few more movies.  And no such fairies appeared in low Megaupload countries.  But how likely is this counter-explanation?  Can you come up with a counter-explanation that is more likely than faeries?  If so, please post here – we love exploring alternate theories to see if they could explain our findings or not.  We want to know the truth.  We just can’t think of any reasonably likely counter-explanations yet.  (except the movie fairies!)

Validity:

This methodology, known as a difference-in-difference technique that exploits treatment intensity, is a very common methodology used in economics and is the basis for many studies in very highly ranked peer-reviewed journals.  By the way, when we requested the data from the studios, we did not tell them our methodology.  We don’t believe they tampered with the data (or we wouldn’t publish).  But even if they hypothetically would have chosen to, they would have had to falsely increase their sales in countries like Spain and France (high Megaupload adoption) while lowering (or not changing) their sales in countries like the US and UK (low Megaupload countries).  That’s the only way one could hypothetically fake data to produce our results.  Since they did not know what our methodology was, that hardly seems like the manner in which one would tamper with data, right?  Lowering or ignoring sales in the US?

About these ads

From → Uncategorized

3 Comments
  1. Brett,

    Thanks for this explanation – it helps people like me who managed to avoid statistics every time some professor tried to sneak it into a college course I was taking.

    But soon enough you will learn that it doesn’t do much good and therefore is probably not worth the bother. The mob will always find a reason to debunk the best science we have available (even though it’s not perfect) if it doesn’t fit their worldview.

  2. John Smith permalink

    > Since they did not know what our methodology was, that hardly seems like the manner in which one would tamper with data, right? Lowering or ignoring sales in the US?

    Depends what the alternatives are. If they don’t stand to lose significantly from it, and it can be reasonably foreseen that someone would attempt to apply a standard methodology to it, then of course they’re going to lie just as a matter of habit.

    What else could have been done with the data where altering the data would have harmed them politically – given whatever explanation you gave them for the request and the time-frame in which you did so?

    If this looks like working out well for them, and there’s not a major problem for them in faking, then of course they’re going to do it – assuming they’re not idiots of course.

    • Hi John –

      Assuming for a moment that we accept your ultimately cynical view of the world “if lying will make the results look good then *of course* they will lie”, I think you have to consider that “standard methodology” means different things to different people.

      To an economist, a diff-in-diff is standard, although a diff-in-diff where one of the differences is a continuous “treatment intensity” variable is a bit less so. Even then, figuring out how to apply this stuff is the entire point of research – and until we did this study, no one had thought to look at it this way. The chances of someone foreseeing this are not high – if they could have foreseen it, one would think they might have done it.

      The only request we made to the studios was for weekly digital sales/rentals data for a period of time. I feel confident that they actually would not have known how to appropriately manipulate the data a priori and that the necessary manipulations would have seemed counterintuitive to them (even now most people who comment on this don’t seem to understand our methodology).

      But I do understand your concern. To tell you the truth, this is the catch-22. If you want to live in a paranoid world where you don’t trust the studios and you don’t trust the judgement of academic researchers on data quality, then the only research you can trust is research done with publicly available, verifiable data. But many questions (particularly ones I am interested in) can only be answered with private, proprietary data that no company would want to make publicly available. It’s a pretty common issue in empirical industrial organization. I feel confident in the data and I do a lot of work to ensure I’m getting good data, but the point of research is to inform your opinion – not to force one on you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: