POLICY / CIVILIZATION & DISCONTENTS
Feds go overboard in prosecuting information activist
Aaron Swartz speaking at Freedom to Connect conference in Washington, DC, on May 21, 2012.
When I was in grad school, we had a "visitor" WiFi network available to people visiting campus. The network was only supposed to be used by guests; access was automatically cut off after two weeks to force students and staff to register for the main campus network. But registering was a bit of a hassle, so when I first got to campus I simply used the visitor network. When my two weeks ran out, I was still too lazy to register—so I spoofed my media access control (MAC) address and got another two weeks of free access.
Maybe that's why the federal government's aggressive prosecution of activist Aaron Swartz for "hacking" activities that include MAC address spoofing makes me so uncomfortable. A year ago, we wrote
about the indictment of Swartz for spidering millions of academic papers from the JSTOR subscription archive. Now, the federal government has unsealed a new indictment
, increasing the number of charges against Swartz from four to thirteen. If convicted of all charges, Swartz could be sentenced to decades in prison.
The document liberation front
Many universities pay hefty subscription fees to provide their users unlimited access to archives like JSTOR. Most non-academics pay by the article. Swartz, who was a fellow at Harvard University in the fall of 2010, was apparently unhappy about this situation and so joined neighboring MIT's WiFi network as a guest and began rapidly downloading JSTOR documents. He reportedly got 4.8 million of them.
Our bad! It looks like we're experiencing playback issues.
When JSTOR blocked his IP address, Swartz allegedly connected with a different IP address. When MIT then cut off his laptop from the network, Swartz allegedly changed his MAC address to allow him to regain access. Eventually, the government says that Swartz entered an MIT networking closet and plugged his laptop directly into the campus network.
The updated indictment describes the scene when Swartz returned to the closet a few days later to pick up his laptop: "Swartz held his bicycle helmet like a mask to shield his face, looking through ventilation holes in the helmet. Swartz then removed his computer equipment from the closet, put it in his backpack, and left, again masking his face with the bicycle helmet before peering through a crack in the double doors and cautiously stepping out."
Abusing the Computer Fraud and Abuse Act
Congress passed the Computer Fraud and Abuse Act (CFAA) in 1986 to deal with the then-new problem of malicious computer hacking. Because the law was passed when the Internet was still in its infancy, the exact scope of its provisions
remains murky today. For example, there have been cases of employers suing employees under the CFAA
for using their employer-provided credentials to access information on the corporate intranet that wasn't intended for them.
In 2008, the government prosecuted a woman under the CFAA after her "cyber-bullying" of a teenager contributed to her suicide. The government argued that the woman's actions violated the MySpace user agreement, and therefore constituted unauthorized access to MySpace servers. The woman was convicted, but her conviction was later thrown out by an appeals court.
The government seems to be making a similar argument in the Swartz case. It says he violated the CFAA when he "intentionally accessed computers belonging to MIT and JSTOR without authorization, and thereby obtained from protected computers information whose value exceeded $5,000—namely, digitized journal articles from JSTOR's archive." By breaking Swartz's actions up into five different date ranges and charging him under two different sections of the CFAA for each, the government has ginned up a total of 10 counts, each of which is theoretically punishable by five years in prison. For good measure, they also charged Swartz with one count of "recklessly damaging" a computer under the CFAA and two counts of wire fraud.
It's a stretch to say that Swartz gained unauthorized access to JSTOR's servers. Initially, he did have authorization to access both the network and the JSTOR website. But according to the indictment, "each user must agree and acknowledge that they cannot download or export contents from JSTOR's computer servers with automated computer programs such as Web robots, spiders, or scrapers." The government seems to believe that once Swartz ran afoul of this contractual requirement, he became an unauthorized user and therefore a felon under the CFAA.
But treating the violation of such use restrictions, or the evasion of efforts to enforce them, as a felony is overkill. Automated crawling of websites is an extremely common activity that can have social benefits. While crawling a public (or, in the case of JSTOR semi-public) website against the wishes of its owner is generally bad manners, it's hardly comparable to hacking into someone's computer to access private information.
And as security researcher Chris Soghoian has pointed out
, website terms of service impose a wide variety of requirements—some of which are routinely ignored by users. Criminalizing such violations is a bad idea.
Keeping a sense of perspective
It's not clear that Swartz's actions caused any significant harm. While the indictment asserts that Swartz's spidering disrupted other users' access to the JSTOR site, it does not give any quantitative details about the extent of disruption. The most significant harm was likely JSTOR's decision to cut off access to the entire MIT campus for several days to stop Swartz's downloads, which was not Swartz's idea.
The government alleges that Swartz planned to distribute the documents on peer-to-peer networks. Swartz reportedly penned a 2008 Guerilla Open Access Manifesto
that argued that "we need to download scientific journals and upload them to file sharing networks." But if this was Swartz's plan, he never carried it out. He has reportedly surrendered all copies of the downloaded files. And in any event, distributing copies of copyrighted works would be an offense under copyright
law, not the CFAA.
JSTOR, the alleged victim, tells Ars Technica that it did not seek Swartz's prosecution and has only participated in the case as a subpoenaed witness. Of course, the government doesn't need a victim's permission to bring a criminal case against a defendant, but if JSTOR didn't feel Swartz's actions merited criminal prosecution, it seems like overkill for the government to pursue the case anyway.
Swartz clearly has history on his side. The current model for distributing academic works, in which academics in many fields surrender their copyrights to commercial publishers who re-sell them at a steep mark-up (the academics themselves are generally not paid for the work), is fundamentally broken. Swartz's actions caused no permanent damage, and he was trying to call attention to a very real problem.
This isn't to say that Swartz is wholly innocent. Assuming the facts in the indictment are true, Swartz is something like a digital trespasser. Under Massachusetts law, such trespassing is punishable
by a $100 fine and up to 30 days in prison. That seems about right: if he's going to serve prison time, it should be measured in days rather than years.
Also, curious, if they want to see them distributed as widely as possible, how do they wind up behind paywalls like JSTOR? I've tried finding free copies of some pieces when I needed them for research, and it was often next to impossible...I just had to truck it down to campus (then) or work (now) to get access.
Or are they transferring the rights to journals in exchange for publication, which is why I said "rightsholders" and not "authors?"
Not sarcasm, btw, I really don't have a firm grasp of how the money stream in academic publishing works.
When I started graduate work, everything was published in printed form. Getting an article written, reviewed, accepted, and published all took time and lots of ink and paper. While you subconsciously knew that someone was paying for all those journals in the library, it wasn't you, so to you the information WAS freely available.
As an author, you always got a set of courtesy "reprints" from the publisher. If someone didn't have access to the journal in their library, they asked you for a reprint and you sent them one gratis (a lot of institutions had pre-printed reprint request cards for their faculty to use, all ready for mailing). You always kept one copy yourself, and duplicated it if you ran out. So the information was STILL freely available (since your institution also paid the postage). The unwritten professional rule of always honouring reprint requests ensured that, should you run into the one journal your library didn't have, you could still get a copy for free. Journals were split between those produced and printed by professional societies (although the actual printing/publishing might well be contracted out) and commercial scientific publishers.
As costs went up and revenue at academic institutions went down, pressure on library budgets increased, but the reprint system - not to mention the increasing availability of personal computers, software, small printers, and better photocopiers - kept most people happy. Marginal publications either disappeared or were bought by larger publishers, subscription "packages" and library consortia grew more common, and there were grumblings, but there really was no practical alternative: if you wanted the validation of having your work peer-reviewed and put out for the world to see, you still had to go through a print journal to do so. Once accepted and published, however, more people could find the abstract electronically and contact you by e-mail to request a copy. You'd still get courtesy copies from the publisher (though not as many), but you'd probably just print off a copy and mail it (because, you know, e-mail size quotas...)
Once the internet really started to take off, things started getting very interesting. Lots more people - including reporters, school teachers and - horrors! - members of the general public were finding article abstracts and looking for the actual articles; publication costs and subscription fees kept rising (though at different rates); libraries continued to have their budgets squeezed and were, anyway, running out of physical space; and there was a huge proliferation of journals - copy-cat ones, vanity ones (every article by the editor-in-chief or a close associate), sub-sub-sub-discipline ones... Subscription bundles increasingly forced libraries into accepting a bunch of really marginal titles no-one wanted in order to get the few that were deemed absolutely essential (although there was usually much internal arguing about what those were), publishers continued to buy up publishers, and more professional society journals became major money-losers for their owners. The proliferation of titles meant that trying to do any kind of search in print abstracts was pointless but - the internet! Electronic publication? Cheap! Electronic abstract databases and searching? Fantastic! Concerting library stacks into study space, offices, or a new research facility? Awesome!
Except you still had this huge amount of material that was only available on paper. Enter a variety of organisations and services like JSTOR and Science Direct: arrange to licence, scan, and store that archive, and recover the costs for the enormous amount of work involved by selling subscriptions to institutional libraries that were increasingly providing electronic resources and database access.
This is already a long post, so I'll skip the more recent stuff, but I hope you get some sense of the incredible sea change that has occurred in academic publishing, mostly in just the last 10 years. I'm extremely lucky in that I get free access to a huge range of journals and books in electronic form right from my desk through my institution's library system. The amount of material inside JSTOR alone is incredible, but SOMEONE has to pay to keep the servers running, the network connections up, and the storage media both sufficient and working. I'm certainly not excusing some of the publishers (names withheld because everyone already knows who they are), but I was rather taken back by Aaron Swartz's actions when the story first came out.
Academics who want their work to literally be freely available can certainly publish their work on their own web sites, or through open access journals (assuming they can afford the open access publication fees as authors - I certainly can't). We (as academics) have many more choices for how we make stuff available. Arguing that something like the contents in JSTOR should be completely free to anyone, anywhere in the world, at any time, misses the point though: there are real costs in providing such access, and someone has to pay them. And it's not just the raw content, but the services around accessing, navigating, and searching that content (metadata tags, anyone?) that make it really valuable.
32 posts | registered Jul 6, 2010
Timothy B. Lee
/ Timothy is a senior reporter covering tech policy, blockchain technologies and the future of transportation. He lives in Washington DC.