Lessons learned creating BingMatrix in ASP.NET and Azure (goals, http requests, threads, progress message and postbacks)

I have spent my last few weekends building an ASP.NET web application that sends multiple queries to Bing and displays the results in a table. I called it BingMatrix and you can read more about what it does and how it works in the blog post titled “BingMatrix – A Windows Azure application that provides a fun way to mine data from Bing”. While doing that, I continued to learn a few tricks for ASP.NET and about the Azure environment in particular. This post summarizes some of the things I learned while developing this web app. It’s worth mentioning that I have previously played with Azure in an earlier release as I described at “Experimenting with Windows Azure and understanding its runtime environment better”.

1 – Have a clear functional goal

One of the interesting things about this project was the fact that a had a very specific functional goal in mind, which was to find the popularity of specific keywords in different sites using Bing. The initial need was to pinpoint which of the many registry keys used the Windows file server protocol (SMB) was most commonly used. I found clues about that by querying Bing on those unique keywords while including a “site:” constraint in the search. I found that having a specific goal in mind is a great way to focus, even if all you’re really doing is experimenting with Azure.

2 – Break the project into manageable parts

When I first started developing this, while I had a more lofty goal in mind, I focused on a basic functionality first. On the first weekend, I got a single query to work. Next, got the multiple keyword and site results, but still in a simple comma-delimited output. Only later I got the actual HTML table output that looked a little prettier. Later steps included adding a link to the actual Bing query, multithreaded processing for added speed, parameter passing in the URL and a “processing…” message while the matrix was built. Throughout the process, I always had a functional site that would provide at least a portion of the functionality I was planning.

3 – Don’t be afraid to refactor

I also learned that the Visual Studio 2010 refactoring tools are efficient and helpful. It’s amazing how we choose poorly named variables, classes and methods at first. In my old days, I would be always afraid to “break” things by attempting to renamed things. VS 2010 has really gotten really good at performing this kind of refactoring. It helped a lot with my final code readability.

4 – Getting some HTML page in ASP.NET is easy…

ASP.NET includes lots of functionality, so I’m always amazed of how much you can do just leveraging the .NET framework. For instance, here’s the code to get the response from a web page, which I used to get a response from Bing. There’s a bit of type gymnastics involved, but the code takes a string in (strURL) and outputs a single string as well (Results). I found that parsing as string was easier than using an XML document, since I was just looking for a single item per page.

Uri BingURI = new Uri(strURL);
HttpWebRequest BingRequest = (HttpWebRequest)HttpWebRequest.Create(BingURI);
BingRequest.Method = WebRequestMethods.Http.Get;
HttpWebResponse BingResponse = (HttpWebResponse)BingRequest.GetResponse();
StreamReader BingReader = new StreamReader(BingResponse.GetResponseStream());
Results = BingReader.ReadToEnd().ToString();
if (BingResponse != null) BingResponse.Close();

I looked into using the Bing API, by the way, but decide to stick with the common Bing page, since there were some slight discrepancies between the number of this returned by each when that number was fairly high. Using the actual Bing page made sure the user would likely see the same count when hitting the URL to go to Bing. It would also be trickier to keep track of both URLs.

5 – Multi-threading

At some point I noticed how issuing the Bing queries sequentially made the app slow. I then spent the time to make that portion of the code run asynchronously by using multiple threads. I must admit that that I was a little concerned with using multi-threading. I have used it in the past (not with ASP.NET) and I remember having lots of issues, especially when the threads needed to update the screen. This time around, however, it was fairly straightforward. Processing multiple HTTP requests is a classic case for this. I used a string array to hold the results and used delegates to define the actual procedure that that gets the Bing URL and parses the output. Here are a few code snippets:

/* Shared variables */

const int MAXRESULTS = 300;
int intPendingResults;
string[] ResultsArray = new string[MAXRESULTS];
delegate void GetCountDelegate(int x, string strURL);

/* Callback function - Ends the work on one thread and decrements pending results */

private void GetCountCallBack(IAsyncResult ar)
{
GetCountDelegate dlgt = (GetCountDelegate)ar.AsyncState;
dlgt.EndInvoke(ar);
dlgt = null;
intPendingResults--;
}

/* Send URL to Bing and parse output to get the count */

protected void GetBingCount(int x, string strURL)
{
    /* Lots of code here, including */
    ResultsArray[x-1] = "<A href="" + strURL + "" target="_blank">" + Results + "</A>";
}

/* Loops through keywords and sites to create the matrix */

protected void MakeMatrix()
{

    /* Break down the list of keywords and sites */

    string[] Keywords = lblKeyword.Text.Split(';'); /* Split keywords */
string[] Sites = lblSites.Text.Split(';'); /* Split sites */

  /* Asynchronously queue up of all HTTP requests for counts */

    intResultCount = 0;
intPendingResults = 0;
intKeywordCount = 0;

    while (intKeywordCount < Keywords.Count())
{
intSiteCount = 0;
while (intSiteCount < Sites.Count())
{
            intResultCount++;
if (intResultCount <= MAXRESULTS)
{
strBingURL = MakeBingURL(Keywords[intKeywordCount], Sites[intSiteCount], lblAdditional.Text);
intPendingResults++;
GetCountDelegate cdel = new GetCountDelegate(GetBingCount);
AsyncCallback cbak = new AsyncCallback(GetCountCallBack);
cdel.BeginInvoke(intResultCount, strBingURL, cbak, cdel);
            };
intSiteCount++;
}
intKeywordCount++;
}

    /* wait for all threads to complete */

    while (intPendingResults > 0) { };

    /* now output the results to the table (more code goes here) */

}

I was looking into some new functionality to simplify multithreading on ASP.NET, but I did not find this readily, so I went with the more “classic” approach. It worked well and the page now loads fairly fast even with a large number of keywords and sites.

6 – Showing a “Processing…” message

One item that was surprisingly hard to figure out was how to show a message while the processing of the matrix was going on. They way ASP.NET works, the entire page is shown when the processing is complete, so there was no way in ASP.NET to show a message while the processing was taking place. To overcome this, I needed to add a little javascript code to the HTML page on the first load and force a postback. To be clear, I first added the following code to the code-behind page, in the Page_Load event:

protected void Page_Load(object sender, EventArgs e)
{
if (Page.IsPostBack)
{
MakeMatrix();
}
else
{
lblTitle.Text = Request["t"];
lblKeyword.Text = Request["k"];
lblSites.Text = Request["s"];
lblAdditional.Text = Request["a"];
}
if (lblTitle.Text != "") { Page.Title = "BingMatrix: " + lblTitle.Text; };
}

This essentially makes sure that the first time the page loads, it simply takes the parameters from the URL without actually processing them. It’s only when the form is submitted (and the page is a postback) that the matrix is actually processed. Next, I made the page show the message and automatically submit the default form as soon as it finishes loading. This was done on the ASPX page itself (the HTML portion).

<% if (!Page.IsPostBack) { %>
<h2>Building your BingMatrix with the parameters above. This will take a moment...</h2>
<script type="text/javascript"> function mypostback() { document.forms["main"].submit(); }; window.onload = mypostback; </script>
<% } else { %>
<asp:Table ID="tblResults" runat="server" BorderWidth="1" BorderStyle='Solid' GridLines='Both' HorizontalAlign='Center'></asp:Table>
<asp:Button ID="btnEditQuery" runat="server" Text="Edit this query" onclick="btnEditQuery_Click" />
<% } %>

Please note that “main” might not be the name of your form, so make sure to put the right name there if you're planning to reuse this in your ASP.NET page. I know it looks a bit too complex for something as simple as displaying a message, but I could not find a simple way to implement those asynchronous updates other than using some javascript code in the page itself and code-behind page to check for postbacks. I started looking into some the AJAX functionality that could help with this, like update panels and such, but this was simple enough for my goal.

7 – Using the new Azure portal

This project also was the first that had me using the new Windows Azure portal. The overall process of creating, publishing and uploading an ASP.NET app has not changed much. However, this new SilverLight-based portal is different. It is really well designed and I could easily find my way around it. One thing I would recommend is that you name your deployment including the date your upload it, regardless of being production or staging. You see, you typically deploy a staging deployment and swap that with the production deployment and you can’t rename them later. Using a date (or date/time if you update that frequently) as part of the name will make it easier to figure out which one is which after you swap their virtual addreses (Swap VIP button in Azure) and try to delete the older one.

I’m considering adding more features to BingMatrix, like the ability to save queries, see what queries are most used and which sites are commonly targeted. This will require me to start using persistent storage (which right now I don’t need). I was considering just saving the text passed in the URL into a SQL Azure database or an Azure Blob Store. The other option is to implement a full login process (maybe using Facebook) and keep each user’s query history completely separate. That would be more work for me and for users that simply want to pass a query and get a results. Still thinking about it… Feel free to share which you would prefer.

So that’s it for now. Feel free to drop comments on how you like BingMatrix and provide feedback on the code snippets I shared. You see, I am currently working as a Program Manager and this development work is more of a hobby for me. It did help me understand how Azure development works and what are the common tasks you have to go through to get apps deployed in Azure.