Skip to content

Explanation of Megaupload Study (or: Econometrics 101)

As I’ve already blogged, Mike Smith and I released a study on the impact of the Megaupload shutdown on digital movie sales and rentals.

Sine we found that it actually boosted revenues meaningfully, there are naturally a number of people who don’t like the study and criticize it without even reading the abstract, let alone the paper.

The most common critique in comments on blogs and news articles is that “sales were increasing anyway because of (digital growth) (new digital channels) (blockbusters released in January) (insert your favorite reason you think sales would have grown here).”  I suppose people think that as economists we would not have thought of this.

I thought I’d explain the actual methodology of the study below and why it accounts for this… in other words, why you might find it compelling.  But I’ll do so by analogy without any econometrics equations.

What A Bad Study Would Look Like:

Imagine you wanted to know the effectiveness of a new medicine in treating the common cold.  And imagine you had a good way to measure how bad a patient’s symptoms were each day.  If you just took 100 people who had had the cold for 4 days and gave them your pill, and 2 days later they were all very improved, you could hardly release this as a study.  People would (rightfully) say “colds usually last about 4 days, your patients all would have been recovering anyway!  You guys are hacks (paid for by big pharmaceuticals).”  That would be the equivalent of simply asking how sales changed around the world after the Megaupload shutdown.  This is what people are claiming we did.  It’s also precisely what we say in our half page abstract that we did not do.  So this blog post is to provide some more information to people whose response to our paper is “correlation is not causation.”

What Our Study Actually Did:

What you would really like is to split the patients randomly into two groups of 50, give half of them the medicine and give the other half an identical looking sugar pill.  Then ask how the “treated” group compares in 2 days to the “control” group that got the sugar pill.  That would be a reasonable study.  As scientists, we would have loved it if the government randomly picked half the countries in the world to completely block access to Megaupload and left it untouched in the other countries – we could ask how sales changed in the (randomly chosen) blocked countries compared to the unblocked.  If a new release came out in January that boosted sales, it would boost sales in both sets of countries so our estimate of the effect of the shutdown would not be biased.  Even if you think that the new release came out in some countries but not others (in January), since they were chosen at random it should be coming out in approximately equal numbers of blocked and unblocked countries.  That would be a great way to get a good estimate of the impact.  The unblocked countries give you an estimate of how sales would have changed if not for the shutdown, and any change in the blocked countries over and above this might be attributed to the shutdown.  Like a medical trial with control and treatment groups.

Unfortunately we didn’t have that experiment.  But we had something that is similar and equally valid.  Imagine that you couldn’t give any of your patients pure sugar pills but you could give some patients pills that were 80% medicine and 20% sugar.  And you could give some patients pills that were 60% medicine and 40% sugar.  And some patients pills that were 20% medicine and 80% sugar.  Imagine that before you gave them the pills, all groups of patients were recovering or not recovering at equal rates.  So you have evidence that they are all following about the same recovery track.  Then, immediately after you give them the pills, the people who got the 80% medicine pill have the highest amount of recovery.  And the people who got the 60% medicine pill have reasonably high (but not as high) recovery.  And the people who got the 20% medicine pill have the lowest amount of recovery.  Given that the groups were following the same trend before hand you would have expected them to continue to do so, but *immediately* after getting the pill you observed a strong significant positive correlation between” recovery” and “% medicine in the pill”.  Would you not think that the most likely explanation for this was that the medicine has a causal effect treating the cold?  That’s why we call our correlation a causal impact.

Our situation was analogous.  After controlling for various variables (including Christmas), we found that countries with high Megaupload use had similar sales trends to countries with low Megaupload use before the shutdown (the levels of sales were different, but the time trends were the same).  Immediately after the shutdown, the sale changes were no longer the same.  The sales change was positively correlated with the pre-shutdown amount of Megaupload use.  Countries with high pre-shutdown Megaupload adoption had higher sales growth (or less loss) than countries with low adoption.  Would you not now say that the most logical explanation for this immediate change from no correlation to a correlation is that the Megaupload shutdown causally affected sales?

Is This 100% Proof?

Of course it is not 100% proven.  Perhaps lots of invisible fairies *just happened to appear in January 2012* in countries with high Megaupload use and told consumers to start buying more movies.  And some fairies appeared in medium Megaupload countries and told consumers to start buying a few more movies.  And no such fairies appeared in low Megaupload countries.  But how likely is this counter-explanation?  Can you come up with a counter-explanation that is more likely than faeries?  If so, please post here – we love exploring alternate theories to see if they could explain our findings or not.  We want to know the truth.  We just can’t think of any reasonably likely counter-explanations yet.  (except the movie fairies!)


This methodology, known as a difference-in-difference technique that exploits treatment intensity, is a very common methodology used in economics and is the basis for many studies in very highly ranked peer-reviewed journals.  By the way, when we requested the data from the studios, we did not tell them our methodology.  We don’t believe they tampered with the data (or we wouldn’t publish).  But even if they hypothetically would have chosen to, they would have had to falsely increase their sales in countries like Spain and France (high Megaupload adoption) while lowering (or not changing) their sales in countries like the US and UK (low Megaupload countries).  That’s the only way one could hypothetically fake data to produce our results.  Since they did not know what our methodology was, that hardly seems like the manner in which one would tamper with data, right?  Lowering or ignoring sales in the US?


Megaupload Shutdown Increased Sales of Digital Movies

Last week my coauthor and I released our study “Gone in 60 Seconds: The Impact of the Megaupload Shutdown on Digital Movie Sales“.  In the study we attempt to estimate the causal impact that shutting down Megaupload had on digital sales and rentals for 2 major movie studios.  Using a difference in difference methodology, find that digital revenues for the 2 studios were 6-10% higher than they would have been without the shutdown, in the 12 countries we studied for the 18 weeks following the shutdown.  Note the italics.  We don’t simply look at the sales change after the shutdown.  We have an entire model dedicated to trying to tease out the causal effect.  The intuition of the model is based on common sense – if the shutdown had any impact at all, then we would expect sales growth (after the shutdown) to be larger for countries where many consumers used Megaupload and smaller for countries were a lower percent of consumers were using Megaupload.  After all, in a country where no one pirated at all, the shutdown shouldn’t have had much impact at all.  We do a lot more with the model than just this, but that’s the intuition behind our claims of causation.

The finding that revenue was increased 6-10% by the shutdown is important for a few reasons.  First, it establishes that cyberlocker movie piracy really did harm digital movie sales, a point that has been contended.  Second, it has been said that shutting down piracy websites is like playing a game of whack-a-mole – shut one down and another pops up.  We think it’s not so simple.  We think that firms compete with piracy, and the disadvantage to the firms is that piracy is free.  One thing firms can do due compete is to make their products more desirable, more convenient, more available, more reliable, attractively priced, etc..  But free is a tough price to compete with.  So our study says that policies that make illegal filesharing channels less convenient can impact marginal consumers and make it easier for firms to “compete with free.”

The Wall Street Journal covered our study nicely.  I actually was interviewed on Bloomberg TV, although that was more of a fun interview about what Megaupload was all about than about the implications of our study.  The study is currently under peer review at a good econ journal.

What’s interesting has been the reaction to the study.  News sources and even covered our study quite objectively.  But commenting on blogs has often been negative.  I think our study is being misinterpreted as “all anti-piracy measures are great” and “all pirates are stealing and should be stopped.”  That is really not the point of our study at all. First, we are only measuring the potential benefits (in one channel for one form of media)… these benefits should be weighed against the costs of anti-piracy interventions. 

More importantly, we think the interpretation of our study is different, as we note in our blog post announcing the study.  Sure, there are probably people who will only ever pirate (but many of them wouldn’t have bought anyway, so they are not a loss to content industries).  But there are many pirates *who will buy if the content on legal channels compares favorably to the content on legal channels* on dimensions like 1) what is available  2) when it’s available  3) how convenient is it  4)  how reliable is it, etc…

So what we’re saying is that part of the story is that making legal content more appealing *can* allow firms to compete with piracy.  But the other side of the story is that piracy is free, and free is a very attractive price.  Policies or interventions that make piracy less convenient can also help to swing marginal consumers from illegal channels into legal ones.  The question is which policies or interventions are worthwhile from a cost-benefit standpoint.

We’ll be doing more research into this.

Will the Copyright Alert System Work?

So the long-heralded Copyright Alert System (sometimes incorrectly referred to as the “six strikes program”) has finally launched in the US.  If you don’t know what this means, essentially five of the major Internet Service Providers in the US have agreed to allow monitoring of users’ Internet connections in search of copyright infringing material.  The first two times you are caught downloading illegal content you will get educational alerts (letting you know what you did was illegal), the second two times you will get ascknowledgement alerts (“acknowledge that you did this and that it was illegal”), and the third two times (“strikes” five and six) some minor penalties will be imposed upon you to mitigate your behavior.  The nature of these penalties is unclear, but an example might be lowering the speed of your connection.  At this time, there are no plans for any warnings after the sixth nor any greater penalty.

I think this is a fascinating contrast to HADOPI, the “three strikes law” in France whereby users could be disconnected from the Internet for up to one month after their third strike.  HADOPI was imposed by the government, whereas the Copyright Alert System is actually a voluntary cooperation between the ISP’s.  And while HADOPI’s final penalty is fairly severe, the penalties under the Copyright Alert System are less so and then after 6 infractions, at least at this point it looks like there are no more penalties at all.

These two points make the Copyright Alert System fascinating to a researcher like me.  My prior research shows that HADOPI had a significant positive impact on digital music sales when it made headlines in France (even before warnings started going out).  But, even though the impact occurred before any penalties were applied, everyone knew the penalties were severe.  In the case of Copyright Alert System, people may be less worried about the penalties, less convinced that they will actually be applied (since it’s not a law), or even anxious to get to strike 7 and beyond where there may not be penalties. 

On the other hand, perhaps some consumers will see the warnings (something they haven’t been subjected to before) and be dissuaded by them.  I can certainly imagine a type of consumer for which this could be true (probably the same type of consumer who turned to legal channels in France after hearing about HADOPI).  Forbes has an article about the alert system with a quote from the editor-in-chief of, where he basically says that he doesn’t think this program is actually bad for consumers.  There’s something to learn from that – this program seems to be viewed as relatively more benign than HADOPI.  So it will be really interesting to see if it can have a serious impact.  I’d imagine that we might be able to measure any impact using similar methods we used for HADOPI, although I also have another potential idea for how to measure the impact if i can get some anonymous aggregated data from the ISP’s.  I think it’s important to figure out if relatively benign, voluntary responses like this can work and how they compare to legal responses such as HADOPI or shutting down sites like Megaupload.

Piracy and Internet Search – The Debate

A few weeks ago, Google agreed (after a long period of pressure) to give lower priority in search results to sites that have received take down notices for containing copyrighted content (i.e. piracy sites).  This is actually a pretty interesting debate, whether they should have to do so or not.  Ignoring a few of the intricacies of this particular case (I’m aware of them, but let’s keep it simple), here are the major argument I see on both sides.

1)  Google makes money when you use their search.  They do so indirectly, but there is certainly a profit from search advertising.  If people use Google to find pirated content, then Google is profiting from providing search for (and a connection to) pirated content.  Some argue that this makes them as guilty as a cyberlocker that actually hosts the content, although this is debatable. But it is not debatable that Google has a profit-incentive to link to popular pirated content.

2)  However, Google’s search algorithm is theoretically agnostic (although it is alleged that they prioritize their own products and businesses).  It’s just following the best principles that their search engineers came up with for all searches.  If that puts a piracy site to the top, it’s simply due to an objective search algorithm.  Should we suppress this?

3)  Here’s the real question – does anyone really need Google to find the content that they want?  Do users go to Google just because it’s the easiest way to find the link to the content that they want at, say, The Pirate Bay (oops, better not link to that one!)?  Or at  Or do they really not know that these sites exist, and without Google they would not get to the content?  This is actually an interesting empirical question – after Google has implemented this new prioritization that gives lower priority to piracy sites, will piracy actually go down and sales go up?  Or is this a useless measure?

So far I’ve shown evidence that laws aimed at deterring consumers from filesharing can increase music sales, and I’m about to put out some work showing that shutting down a major cyberlocker increased movie sales.  But I have to admit that in spite of the enormous market share of Google, I’m skeptical that this particular policy change will have any effect.  That said, I’d much rather go with an empirical answer than my gut.

Spotify comments on “windowing” by artists

One of the latest ideas in the debate on streaming media is the concept of windowing.  When Coldplay released their album Mylo Xyloto (awesome album in my opinion Coldplay – Mylo Xyloto), they were criticized by some (and praised by others) for not making it available on Spotify.  This naturally plays in with the whole debate over whether streaming music promotes sales of that music or cannibalizes sales.

It turns out, Coldplay was not completely against streaming.  They were practicing a “windowing” strategy… they made the album available first only on sales channels, but have more recently made it available for streaming.  This has some similarities to book publishers releasing a book in hardcover than trade paperback, or movie studios releasing movies in the theater, then on DVD, then on Cable television.

Ken Parks at Spotify has come out aggressively against this windowing approach.  Here is an article about it.

I think it’s really relevant right now to figure out these relationships between streaming and sales, and if they are different for different artists, during different windows in the release cycle, etc…


Reel Piracy: Take Two

A month ago I posted about a new working paper that I finished with Joel Waldfogel.  The paper is entitled “Reel Piracy: The Effect of Online Film Piracy on International Box Office Sales“.

It seems that a number of online blogs have picked up on a small portion of the paper that employs a weaker methodology than the main section of the paper, citing our study as evidence that piracy does not displace sales in the U.S. box office.  It even seems to have picked up some press as a result.

We feel as if our findings are being misinterpreted, or at least a bit misrepresented.  Here is my coauthor’s blog post about the issue at Digitopoly.


**Update – Looks like responsible reporters actually read the study and got it right.  Here’s the article in the Wall Street Journal  –

And here’s the article in the LA Times  –



Spotify: Panacea or Plague?

I recently blogged about how Spotify opened up their code to app developers and will be able to offer many of the things that people want when experiencing music.  But, the news isn’t necessarily all good.

A big debate right now is whether legal streaming of music -on services like Spotify- is good for the industry or not… and whether it is good for individual artists or not (these are not necessarily the same question!).  Coldplay recently refused to release their newest album on Spotify, hoping instead for people to buy the album.  Adele did the same.  And a number of independent labels removed all of their content from Spotify.

There are a number of long-term questions here worth asking about whether the move to streaming is inevitable (do people want to own music anymore?) and whether the music industry can continue to exist in that form.  Currently artists seem to get a much smaller profit margin out of streaming than legal sales, although streaming in the U.S. is still a small fraction of the market and one cannot compare the profits from a single instance of streaming to the profits from a sale (a purchased file is listened to many times).

What I’m interested in is perhaps more short- or medium-term right now… as services like Spotify are adopted more widely, does this lead overall to more or less music purchasing?  It may lead to less because streaming is a substitute for buying.  But it may lead to more because streaming is also a complement – people who stream music are exposed to more music, can sample songs they like, and may pass on information to their friends.  And people who choose to stream legally have chosen to do so instead of acquiring the music through illegal filesharing.  I think it would benefit music labels to have an answer to this question as they consider whether to license their music out to various new services.

More to come as I begin to research this question…  Thoughts?