Quora Keeps the World's Knowledge For Itself

published by Eric Mill on
The Stockholm Public Library's proposed Wall of Knowledge.

I recently made the mistake of answering a question on Quora: "What is it like to be a member of the 18F team?"

I have a Quora account, but I've almost never used it. I noticed this question because the user somehow found me and "asked" me to answer it, which caused Quora to email me. It's a fine question, the kind of question Quora excels at getting answered, and I think the answers it produced will stand as an accurate snapshot of how the early 18F team feels about their job and their work.

Sadly, Quora's policy is to lock that snapshot in their own private store, and out of the historical record.

The Internet Archive is one of the world's most fantastical organizations, housed in a majestic temple, and is maniacally devoted to archiving the entire Internet since 1996.

At the Archive's Wayback Machine, you can search through and relive an unbelievably rich swath of the Internet's history. It's one of the most valuable and widely cited collections in the world, and it's actually difficult now to imagine tolerating an Internet without it.

But if you visit Quora's /robots.txt (the Internet's standard for sending web crawlers a strongly worded letter) you'll see that they single out the Internet Archive for exclusion. The Internet Archive respects the robots.txt standard, and so there is no Wayback Machine content for Quora.

Quora recognizes that this is significant enough to merit an explanation in their robots.txt.

 # People share a lot of sensitive material on Quora - controversial political
 # views, workplace gossip and compensation, and negative opinions held of
 # companies. Over many years, as they change jobs or change their views, it is
 # important that they can delete or anonymize their previously-written answers.
 # We opt out of the wayback machine because inclusion would allow people to
 # discover the identity of authors who had written sensitive answers publicly and
 # later had made them anonymous, and because it would prevent authors from being
 # able to remove their content from the internet if they change their mind about
 # publishing it. As far as we can tell, there is no way for sites to selectively
 # programmatically remove content from the archive and so this is the only way
 # for us to protect writers. If they open up an API where we can remove content
 # from the archive when authors remove it from Quora, but leave the rest of the
 # content archived, we would be happy to opt back in. See the page here:
 # https://archive.org/about/exclude.php
 # Meanwhile, if you are looking for an older version of any content on Quora, we
 # have full edit history tracked and accessible in product (with the exception of
 # content that has been removed by the author). You can generally access this by
 # clicking on timestamps, or by appending "/log" to the URL of any content page.
 # For any questions or feedback about this please email robotstxt@quora.com.

 User-agent: ia_archiver
 Disallow: /

(Update: After publication of this piece, Quora added the first paragraph above. I've updated the excerpt above to match their current robots.txt.)

So, Quora's rationale for blocking the Internet Archive is that Quora can't go back and automatically rewrite history whenever one of its users wants to.

Bear in mind, you can already remove individual pages (in their entirety) from the Internet Archive by adding them to your robots.txt. Quora is asking for the ability to excise specific content from already archived pages.

Can you imagine if the Archive implemented such a feature? It would make the Archive's historical record completely untrustworthy, and destroy its credibility. You can bet many people, companies, and governments would love the ability to go in and selectively excise or modify the historical record of their work.

But that's not how history works, and it's definitely not how the Internet works. Quora is not a private communications network. When users contribute to Quora, they're participating in Quora's mission: to "share and grow the world's knowledge". Like publishing a book, making a TV show, or any other form of human broadcasting: once it's out there, you can shape its use, but you don't get to withdraw it from the public record.

The Library of Alexandria.

What Quora is asking for from the Internet Archive — and really, since the Archive has no public competition, from the Internet — is unreasonable, short-sighted, and selfish. Quora is simply being a shark about "their" content, at the public's expense.

I usually try to be generous about people's motives, but I'm comfortable assuming the worst here. Quora's reason is simply too flimsy, and its business incentives too tangled up in the outcome, for their comment above to be the full story.

Quora is a free service built on venture capital that will need to monetize its users over the next couple years, and wouldn't you know, they really want you to visit quora.com, and they really want you to create an account.

In fact, until very recently, Quora would block visitors from seeing more than the first answer unless they logged in. I'm willing to bet most people reading this have run into that popup before.

They've taken down the popup, but all that content is still entirely under Quora's control. If Quora was to collapse next year, it's completely unclear what would happen to the human output they've collected. This is not theoretical: unilateral mass destruction of user-generated content happens all the time.

Quora's question-and-answer system is a generalization of the extraordinarily successful model of Stack Overflow, a Q&A site for coders that quickly became the world's free google-able university and teaching assistant for anyone working in technology.

It's not an understatement to say that Stack Overflow has completely changed how and how fast software developers get their work done today. If you were to chart the days I visit Stack Overflow, it would look awfully similar to the chart on my GitHub profile. Stack Overflow has since expanded into the Stack Exchange network, and in many ways is a direct competitor to Quora.

Like Quora, Stack Overflow is privately run, and like Quora, Stack Overflow depends on a community of active users to generate activity, knowledge, and revenue.

Unlike Quora, Stack Overflow has no problem being archived back to Day 1 of its existence — presumably because the founders understand how the Internet works and understand what it actually means to grow the world's knowledge.

Until Quora understands this, I'll be contributing my knowledge to the world, and not to Quora.

The Internet Archive's ceramic archivists. Photo by Tom Foremski of Silicon Valley Watcher.