Following the FISA Court: New Website, New Data

published by Eric Mill on

Short version: The FISA Court has a new website, and @FISACourt is way smarter because I'm turning their entire docket into data.

Longer version:

One year ago, a large population was told their entire network was being surveilled. A short time later, the Foreign Intelligence Surveillance Court (also known as the "FISA Court", or "FISC"), which oversees surveillance policy in the US, began publishing a docket after 35 years of silence.

Ever since, I've been following the Court by running @FISACourt on Twitter, which does stuff like this:

@FISACourt is powered by an open source tool I wrote to watch the Court's public docket site. Within 5 minutes of any change to the docket, the tool automatically posts a tweet with a link to the HTML change, emails me, and texts my personal cell phone. When that happens, I quickly investigate and read anything that was posted, and follow up with a hand-written explanation.

Do your reading

Because of @FISACourt, I've read basically every public filing the Court has made, and have very often been the First To Know and First To Tweet.

Reading all those filings feels amazing. If you want to know what's going on in this world, you need to read primary sources. The public docket tells a story, and few in the media are doing any more than reporting on the most exciting spikes.

The FISC's docket is a tiny piece of the action surrounding surveillance, but since the nation's only Court dedicated to surveillance first opened its ad hoc internet welcome mat in June of 2013, allowing the public a window into its work for the first time in 35 years, we've gotten to see:

  • The Court declassifying and publishing a full justification for allowing bulk collection of phone metadata under the Fourth Amendment.
  • An actual order by the Court authorizing bulk collection of telephone metadata under section 215 of the Patriot Act — for what I believe is the first time — along with a legal justification for granting it.
  • The Department of Justice scolded by the Court for failing to inform the Court of relevant pending lawsuits in a dispute with the EFF over preserving bulk telephony metadata.
  • The Department of Justice put on the spot by the Court after refusing to declassify any part of an opinion interpreting section 215 of the Patriot Act. (The government had previously been ordered by the Court to review it for declassification.)
  • The sad end to a once-promising lawsuit by major technology companies chafing at the government's restrictions from publishing exact numbers of request statistics: the companies settled with the government, and increased the granularity of reporting from 1000's to 250's, if some kinds of requests are jumbled together.
  • The Court responding to inquiries from prominent senators by publicly describing its operations, and following up with numbers on how often they make the government revise their requests before granting them (~25% of the time).

We also saw the FISA Court publish a law student's brief arguing the Court's unconstitutionality (and citing my blog!), and then retract it without any notice (here's the original). I'm not judging the merits of the brief's arguments, but the Court's decision to deny the request as "moot", after making it moot by sitting on the request for 4 months, is obnoxious and beneath the Court.

Nonetheless, the Court is to be praised for making it possible to see all this. The Court, so secret that they won't label their door, made a quick-and-dirty website last year while under the international spotlight, and just started uploading PDFs. Messy, but compared to the 35 years of silence before it, it was a huge step.

A brand new website

In another step, the FISA Court recently relaunched their site, now at fisc.uscourts.gov. It's a Real Website, built with the open source framework Drupal. It offers some new things, like their judges, rules, and even an RSS feed.

Of course, the filings are still scanned, un-searchable image PDFs, their utility hobbled by some combination of backwards processes and redaction paranoia, and the FISC's own search engine suffers the consequences. But the RSS feed is a big deal -- anyone can follow the docket, in more-or-less real time, without needing something like @FISACourt.

Most importantly, the Court is committing to a public presence in a way their last docket never did. The site is here to stay, at its own URL and subdomain, and there's a much more solid foundation from which to make technical improvements. They're acknowledging their visible role in our nation's democracy, and deliberately engaging the public to help them better understand the Court's work. That's courageous and an investment, and I appreciate it.

Turning the FISA Court into data

Naturally, the new website immediately broke @FISACourt. I decided to see it as an opportunity, and rewrote the whole tool to produce a comprehensive, up-to-date dataset of the Court's public docket.

I'm now doing a proper crawl of the entire FISC website, and saving data for every filing. For example, this motion becomes:

 ---
 file_url: "http://www.fisc.uscourts.gov/sites/default/files/105B%28g%29%2007-01%20Order-1.pdf"
 title: "Order (June 17, 2013)"
 dockets:
 - name: "105B(g) 07-01"
   url: "http://www.fisc.uscourts.gov/docket/105bg-07-01"
 posted_on: '2014-04-15'
 landing_url: "http://www.fisc.uscourts.gov/public-filings/order-june-17-2013"
 id: "order-june-17-2013"
 last_sha: "ae16d4a74586762659efe28cb8611a796f42462380a3f140475f549a7b047cd8"
 last_etag: "240e18-5c510-4f71750b21840"

That data is in YAML, a humane data format that lends itself well to human readability, and line-by-line diffs.

I'm also downloading the actual PDFs, and watching them for any future changes. As the Court showed with its unexplained withdrawal detailed above (and as the Supreme Court frequently demonstrates), it's important to watch the public record for revisions of its history. (Update: my friend David Zvenyach is now operating a similar bot to watch the Supreme Court, at @SCOTUS_servo.)

The data and PDFs are versioned in their own branch, so you can use GitHub's dedicated Atom feed watch every changes the system detects, or use the GitHub Contents API to sync up.

Keep watching

Source: Reset the Net

Nearly a year after Snowden revealed the extent of government surveillance, some of the spotlight and novelty has faded. People no longer drop everything to read a leak-driven Guardian or Intercept story. But the technical and legal mechanics of surveillance, now partly visible to the public eye, are going to be fought over for years to come. We need to keep watching.

If you want to use the FISA Court's data, or the scraper, or have any questions at all, please open an issue over at the project. It's all public domain and you don't need my or anyone's permission, but I love to help out and talk about this stuff.

And hey: if you're reading this and you actually work for the FISA Court, check out the notes I wrote for the Court on how to make your website better, and how to eliminate the need for a system like mine to scrape your HTML. (Also: maybe set up a public email address.)