Wiki Life: Collecting Stats


This week I’d like to give you a sneek peak at one of the tools that I use in my Wiki Life.

I use it to generate the Top Contributors of the Week Awards.

It’s a very simple web crawler, written in C# and WPF

This doesn’t win any style awards you understand, it is a quick and dirty tool to get the stats I need, written mostly in one evening for the task I had been given.

Below is a short (6 minute) video that shows the tool in action, with a small (24 hours) date range.

 

[View:https://www.youtube.com/watch?v=ufaReEg36FE]

 

Please forgive the lack of audio track commentary.

Here is an outline of what you are seeing:

  • First I scan through all the pages from the “Updated Pages” section of the Wiki
  • Then (about 50 seconds into the vid) I scan each revision (history) page for each article that was found
  • Next (around 2 minutes in) I check each articles’s revisions again, ditching old revisions and examining the “revision compare” page for each revision within our date range
  • From this information, I construct a thumbnail image of each document’s changes
  • Finally (at 5 mins, 30 secs) it does one last pass through all the revisions for all the articles and checks to see which article was quickest to be updated by another user (one of the awards)
  • At 6 minutes I show the resulting collection of image files generated from the crawl
  • I finish with a quick glance through the columns and sort options that help me quickly generate the Saturday charts

 

If there are any fellow developers out there, you may ask why I physically load each page, instead of just processing raw html responses from the server. 

The answer is because many of these kind of pages, like the revision compare pages generate their content from Javascript loaded in the page and is not available from raw html response, but retrieved once the page has loaded.

For this reason, I have to physically load the page, wait for the Javascript to pull the page content, THEN read the page.

This means for a slow 2 hour crawl for a whole week, but works fine as a background job.

 

There are still plenty of stats I plan to collect and present over the coming months.

If you have any ideas for other awards we could present from this data, please let us know and I will try to include it in future crawls.

 

Regards,

Peter Laker

Comments (13)

  1. tonyso says:

    Brilliant! Keep up the good work Peter!

  2. caio.vilas@live.com says:

    Nice tool Peter Laker! Do you have plans to share it with the Wiki contributors?

  3. Tomoaki Yoshizawa says:

    Awesome!!

  4. Ed Price - MSFT says:

    Wow, this is cool. A video too!

    Caio has a good point. You can publish your tool on MSDN Gallery/Samples (or even on TechNet Gallery) or on CodePlex if you want to share it.

    Thanks!

  5. XAML guy says:

    Thank you!

    @Caio, the functionality of scanning the wiki was a feature I was playing with in this test app, before porting it over to my TechNet Wiki Widget – social.technet.microsoft.com/…/13977.technet-wiki-widget-windows-8.aspx, but it will be some time before I'm ready and to make it bullet proof enough for general use.

    If it is the web crawling aspect you are interested in, I have a complete example project you can study on MSDN Samples – code.msdn.microsoft.com/WPF-Automation-Loading-6ae6c88a

    If it is the thumbnail generation you want to know more about, check out this other project I posted on TechNet for more on WriteableBitmaps:

    social.technet.microsoft.com/…/13461.blackboard-design-pattern-a-practical-example-radar-defence-system.aspx

  6. Ed Price - MSFT says:

    Oh, wow. That Blackboard Design Pattern article needs to get featured! I love the Gallery + Wiki synergy!

    Thanks!

  7. Ed Price - MSFT says:

    Okay, it's featured on the home page of TNWiki. Thanks!

  8. Santosh Bhandarkar says:

    Very nice !

  9. XAML guy says:

    Yey, thanks Ed! I had a lot of fun making that one, and TNWiki gave me the platform, audience and incentive.

  10. Ed Price - MSFT says:

    I’ve got to ask… Does anyone else know about Wiki articles that build off of Gallery contributions in some way? I added Peter’s example to my blog post that promotes this Gallery + Wiki synergy:

    blogs.technet.com/…/gallery-technet-wiki-community-synergy.aspx

    I also created a tag on TechNet Wiki for us to track them:
    social.technet.microsoft.com/…/default.aspx
     

    Thanks!

  11. Eric Battalio says:

    This is great!

  12. Serhad MAKBULOĞLU says:

    Thanks Peter!

  13. hassan sayed issa20014 says:

    Congratulations