Jay posted about his languishing site traffic since a URL migration for his blog, and I thought I'd get in on the action as well.
Now, I don't work in Search, and I'm essentially speaking from the logical inference portion of my bum here, so please feel free to correct any erroneous assertions I've made.
The Way Of The Blog
One of the first things I read about corporate blogging (and I recommend anyone that's thinking about it goes to read it) was the Corporate Weblog Manifesto, by Robert Scoble.
Lurking near the middle section is The Terrible Twelve:
12) Never change the URL of your weblog. I've done it once and I lost much of my readership and it took several months to build up the same reader patterns and trust.
This is a really, really, really important point. I can't understate how important this is from a traffic perspective, especially with the added measures employed these days in post-honeymoon-period blog security.
Speaking for myself, I have a "core" group of people that read this blog regularly, and are probably subscribed to it directly. That number took an instant nosedive when the URL changed, and the traffic is still well below where used to be (which also tells me I'm not nearly as interesting as I was last year. I blame phenylalanine).
A reasonably large portion of traffic used to come from search engines, particularly Google. Since the migration, search traffic has all but dried up.
Well, this is where my crazy random uninformed theories come into play, but I think it's something like this:
I publish a story called "Tristan's Story". The URL is blogs.example.com/tristank/story .
Someone else thinks the story is cool, and links to blogs.example.com/tristank/story .
Other people (let's say bloggers) link to the same URL, or to the URL of a post that references the original URL, forming a tree of links (nb: I prefer "Pyramid Schemes" to "Tree Schemes", so I'm going to call it a pyramid).
Quick note on assumed search mechanics here: My guess is that most crawly search engines give each link in the pyramid a weighting based on the number of children (and their weight, based on their children and so on), and the overall weight counts towards the top link in the pyramid.
So when linking occurs using a regular static-HTML-page type of setup (a not-blog), weight is quickly added to that story, and it bubbles up the search engine results.
Trackbacks and Comments
For blogs in particular, there's another important and powerful mechanism here: Trackbacks - and in a similar manner, comments. I reckon bloggers have typically been even more powerful than regular news sites, largely due to these.
Reason being: if another blogger links to the story URL, a trackback is often sent to the site hosting the original story. This is a link to the new story by the new blogger that's commonly published on the original site. News sites tend not to generate trackbacks (but have a good search engine rank anyway, as many sites link to news articles).
Let's say that the trackback link from the new blogger's post is published directly below the article, in the comments section. Next time a search engine bot crawls the page, the trackback has a reciprocal effect, also improving the rank of the linker, which improves the rank of the original article as well, and so on.
Over time, more people link to one of the links that link to my link, so the story improves in search engine ranking, and becomes more discoverable in that search engine (eg, a hit nearer the top of search results for "Tristan's Story").
Historically, for one of my more popular posts, this tends to mean that there's an initial viewing by The Regular Gang, some amount of trackbacks and linking, and then the search engine hits start rolling in a couple of days later, followed by some more links, and so on.
Comments typically allow the commenter to leave a URL with their comment, and Back In The Day, this would be crawled with as much vigour as any other link.
Throwing A Spanner In The Works
Here's where I'm unclear on methods employed by bots - if anyone wants to correct me on specific search engines, The More I'll Know.
What happens to all that weight?
Well, the weight essentially points at what's more likely to be a dead link now.
In our case, there's a redirect from a given blogs.msdn.com URL to the equivalent blogs.technet.com URL: The connection to the old URL results in an HTTP 302 (I think), and the content is usually still discoverable via the redirect.
But it seems (to me; just a feeling) that bots might not count a redirection to some content's new home as being "as good as" the old content. After all, someone could have taken over the domain and redirected the bot to any old page full of advertisements. I have no idea how clever the bots are with content comparison and checking.
Old Grey Trackbacks and Comments, She Ain't What She Used To Be
To add to the "your URL ain't what it used to be" problem, the landscape has also changed for trackbacks and comment URLs.
Because the system was so open before, trackbacks and comment URLs (where you leave your website address along with a comment you make on a story) became an easy target for link spammers, out to improve their search engine ratings using the same technique as arguably-more-legitimate users. In some cases, bloggers had to spend a significant amount of time deleting *hundreds* of comment spams to a single blog, all made over the period of a few hours. The payoff for the spammers: Better search engine ranking.
To address this (a guess, again), Community Server (and I'm assuming other blog engines) implemented redirection URLs for comments, so instead of leaving an actual URL on a page with a comment or trackback, you leave an address that's looked up internally, and redirected to, rather than a "raw" URL on the page.
And the nail in the coffin: the "NOFOLLOW" tag has been implemented for some of these URLs in many popular blog engines, telling search engines to back off and disregard them.
So whereas with .Text 095, you might have ended up with a link like <A href="blogs.technet.com/tristank">My Link</A> left directly in the comments section - a link with real value to search engines - with Community Server, we now have something closer to <a rel="nofollow" href="www.example.com/blogredirect.aspx?target=http://blogs.technet.com/tristank">, which is fundamentally worthless to search engines: even if they handle the 302, the "nofollow" tells the bot that the link is not to be crawled.
And so, the loyal cadre of regular blog readers still know where to find you, but your old articles probably won't (ever) have the same weight that they used to, so your posts aren't considered as relevant as they used to be, and the search engine traffic tends to dry up (or at least, it has for me so far).
Gradually, as you post stories that people link to, you'll start rebuilding an audience, and a better search engine ranking, and so on.
Comments/trackbacks aren't nearly as valuable these days, so the primary objective is to get linked in the body of the post by another blogger. Meaning, you have to say something interesting that's worth linking to! Meaning, I'm damned. Forever. Ah well, it's not like we're allowed to sell ad space on our corporate blogs anyway...
At the end of the day, the landscape is harsher than before, and if you're blogging for a living, you might want to work out how to keep that clunky old URL (and slap some banner ads on it, quick!).