I Want To Know Links

published by Eric Mill on

It's been surprisingly frustrating to reliably track who's linking to my "Door to the FISA Court" post. I kept pretty on top of the major traffic drivers in Google Analytics and Twitter for the first few days after posting it, and it did pretty well. But 9 days later, my friend Brandon flagged a New Yorker piece that linked to it (among a long piece with many links), and I realized I had no practical way of finding these myself.

Here are some approaches I can take, and their tragic flaws:

  • Google includes a "link:" operator in searches, to find anything linking to a particular page, but it's nonsensical. Searching for links to just a domain only matches links to just the domain, and searching for links to individual URLs is terribly incomplete.
  • Google Alerts is broken and terrible. I've gotten nothing from mine for ages -- and then for the past week, I've had alerts on "link:isitchristmas.com" and "link:scout.sunlightfoundation.com" send me months- or years-old things, nearly every day.
  • Talkwalker Alerts works, and well, but lacks a "link:" operator.
  • Google Analytics will show you who referred visitors, but this is incomplete -- anything using SSL (https), like Hacker News won't show up, because this blog doesn't use SSL. I'll be adding SSL soon, which will turn this approach from incomplete into very tedious.
  • Google Webmaster Tools will list links to your site, but it has two tragic flaws. It's too much: most of the listed domains, in my case anyway, were people who embedded top Hacker News posts on their site's frame for naked SEO grabs. These domains would also float to the top of the list, because the link appears on every page on the site, leaving me to tediously sift through the long tail looking for stray mentions. It's too little: it's missing things. The New Yorker piece doesn't appear, I suspect because the New Yorker uses a noarchive tag to prevent its content from appearing in Google's cache. While the New Yorker still appears in Google's search results, my guess is ducking the cache prevents Google from doing deep link analysis on the content.
  • mention is a popular-looking paid mention tracker, but it does only that: mentions. No links. [Update 2016-03-03: Mention does do links! And there was a period in 2014 where you could get a decent free account. But they've since raised prices considerably.]
  • Muck Rack is a well designed service for tracking online journalism. It doesn't do links, but it is useful for seeing which journalists furthered my stuff on Twitter, but their alerts require a paid plan, beginning at $99 per month. Sorry, I'm just a dude.

So, I don't know what to do. I'll spend a bit of money, even. I just want to see links! Please let me pay you to tell me about links!

  1. Chavi

    You can use SEO tools to track links to specific pages. Try opensiteexplorer.com, ahrefs.com or majesticseo.com.
    All of them have free versions that will allow you to see a limited number of links to a limited number of sites/page. They can be frighteningly accurate and have good filters, though they won't tell you how much traffic came through those links.

  2. Ed S.

    and that it depends on somebody to actually click through a link

    If no one clicks through a link, is it really a link?

  3. Eric Mill

    No, you're totally right. I am pretty dependent on other institutions to give me this ability; mostly large ones. :(

    I've thought about it more since posting this, and I think the best solution given the tools available now is to have a thing that watches the Google Analytics API for new referring URLs I haven't yet seen, and sends me emails as soon as they show up. The two limitations are that until I transition to SSL I'll miss SSL referers (but I will address this), and that it depends on somebody to actually click through a link (but this is probably fine in practice).

  4. Mike

    Well, the only way to know about the links is to crawl the entire Internet. Regularly. Soooo...there's nobody who does that, but Bing and Google try. They don't provide this info (not even in their APIs, I believe), but you might have some luck using the common crawl. That's totally a lame solution though, and takes about a year before you'd get results.

    Super helpful comment, I know...mostly just commenting in case somebody else has something insightful. I have this curiosity too.

  5. Eric Mill

    I covered link tracking in my post, including GA and GWT -- they're incomplete in different ways, and very tedious. Plus, I'd like to see links that nobody actually clicks on.

  6. joy

    Actually, let me upturn your thinking on tracking links and point you to a new direction. Tracking where your visitors come from via your Web site analytics rather than hoping a third party site might crawl the source of your referring traffic. I don't know how you're hosting this site, but most Web hosting providers provide your raw Web server logs and visual front ends like awstats, analog stats or even javascript "recent visitors" chart.

    When a visitor comes to your site, their referring source (referrer) is usually (it depends on their browser/OS) recorded in your logs - other info like IP address, User Agent, screen size are also recorded too.

    The disadvantage is that it depend on if and how the referrer was recorded, as most Web stats programs truncate the referring domain. (However, I just did a spot check on my domain and it looks like Awstats can show the full URL if there is one.)

    Personally, I use visitor referrer tracking (it's the most dependable) and a mix of Google Webmaster Tools (I don't even bother with the link operator these days). As I do SEO, there are a number of SEO specific software services like the former SEOMoz and the Advanced Web Ranking SEO package - but honestly, I'm just going for the pulse of how my sites are doing.