This week I’d like to give you a sneek peak at one of the tools that I use in my Wiki Life.
I use it to generate the Top Contributors of the Week Awards.
It’s a very simple web crawler, written in C# and WPF
This doesn’t win any style awards you understand, it is a quick and dirty tool to get the stats I need, written mostly in one evening for the task I had been given.
Below is a short (6 minute) video that shows the tool in action, with a small (24 hours) date range.
Please forgive the lack of audio track commentary.
Here is an outline of what you are seeing:
- First I scan through all the pages from the “Updated Pages” section of the Wiki
- Then (about 50 seconds into the vid) I scan each revision (history) page for each article that was found
- Next (around 2 minutes in) I check each articles’s revisions again, ditching old revisions and examining the “revision compare” page for each revision within our date range
- From this information, I construct a thumbnail image of each document’s changes
- Finally (at 5 mins, 30 secs) it does one last pass through all the revisions for all the articles and checks to see which article was quickest to be updated by another user (one of the awards)
- At 6 minutes I show the resulting collection of image files generated from the crawl
- I finish with a quick glance through the columns and sort options that help me quickly generate the Saturday charts
If there are any fellow developers out there, you may ask why I physically load each page, instead of just processing raw html responses from the server.
This means for a slow 2 hour crawl for a whole week, but works fine as a background job.
There are still plenty of stats I plan to collect and present over the coming months.
If you have any ideas for other awards we could present from this data, please let us know and I will try to include it in future crawls.