Imaginary Content - No Really!

When looking for ways to improve their site, the development team for MSDN.microsoft.com and Technet.microsoft.com wanted to be able to leverage a new feature set of ASP.NET - virtual path providers and dynamic compilation. What does a virtual path provider do? It allows you to make your content disappear.

Well, ok, we still have content. But gone are the days of big hard drive web servers, instead we can put everything in SQL. Sure, we could have stuffed out content into SQL before, but if we wanted to have that content contain dynamic code, getting it compiled was nearly impossible.

So, Houdini, how do you make your content disappear? MSDN converted the file based content and stored it in SQL as XML. The team then build a rendering system with standardized header; footer and TOC. As part of this rendering system, a Virtual Path Provider plugged into ASP.NET and allows ASP.NET to compile pages that don't really exist - we put them together using the different pieces (header, footer, TOC, body) and give them to ASP.NET on demand.

Isn't compiling resource intensive? Yes it is, but when the web server receives a request for content; the cache is checked first, if the content is not in the cache then the call is made to the virtual path provider and the compilation happens. The page will then be cached for a sliding time period.

Obviously, with all first attempts there was a learning curve to understand the production behaviors. A few ‘Gotchas’ we ran into:

· Application Pools:

o Auto-App Pool recycling: Turn it off! Once your cached is ‘warmed’ up; you don’t want it dump on a regular bases. Code you’re rendering system to clean up unwanted content from the cache.

o  Control your app pool recycles; once the app pool recycles; your SQL servers will be very busy while it refreshes the cache.

o When there is a need to refresh the caches; do it in a controlled manner; recycle one app pool at a time; and allow 1 minute in between recycles so you don’t hammer the SQL servers.

Below is a script for recycling the app pools from a single server, I call this script from a bat file looking though all the servers.

RecycleAp.vbs SERVER1 AppPool1

Sleep 60

RecycleAp.vbs SERVER2 AppPool1

Sleep 60

RecycleAp.vbs SERVER3 AppPool1

Sleep 60

--Script; name as Recycleap.vbs (Thanks Chris Montgomery)--

'By Default Recycle All App Pools

bRecycleAll = 1

If Wscript.arguments.count = 0 Then

            Wscript.Echo "Syntax: recycleap server [AppPoolName]"

            WScript.Echo "Example: recycle SERVER DefaultAppPool"

            WScript.Quit (0)

Else

            strServer = WScript.arguments(0)

            If WScript.arguments.count > 1 Then

                        strAppPoolName = WScript.arguments(1)

                        bRecycleAll = 0

            End If

End If

           

set Locator = CreateObject("WbemScripting.SWbemLocator")

Locator.Security_.AuthenticationLevel = 6

set Service = locator.connectserver(strServer,"root/MicrosoftIISv2")

set APCollection = Service.InstancesOf("IISApplicationPool")

If bRecycleAll = 1 Then

            For each APInstance in APCollection

                        APInstance.Recycle

                        Wscript.Sleep(5000)

            Next

Else

            For each APInstance in APCollection

                        If UCase(ApInstance.Name) = UCase("W3SVC/AppPools/" & strAppPoolName) Then

                                    APInstance.Recycle

                        End If

            Next

End If

                       

· IIS Logs:

o Watch for Bots/Crawlers; they can hammer your SQL servers and pollute you cache with unused data. Got Logparser? (https://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07\&displaylang=en)

o Here are a few queries’ I run; for the c-ip; I use www.dnsstuff.com’s reverse IP look to find the owner.

logparser "Select c-ip, count(*) as c, propcount(*) as p from IISLOGNAME.log group by c-ip order by c desc”

logparser "Select cs(User-Agent), count(*) as c, propcount(*) as p from IISLOGNAME.log Where cs(User-Agent) LIKE '%Googlebot%' group by cs(User-Agent) order by c desc"

· Performance counters;

o Keep an eye on your time in GC. Are you spending too much time in GC? This can cause performance problems. Notable high CPU.

o Watch you available memory & CPU:

o ASP.net: Requests Queued; request Wait time. If your web system is running hot; check your SQL servers.

· Be mindful of your international content.

o We notice our busiest time of the day, early in the morning -- MSDN & TECHNET are hosted in two different data centers on the web coast; we notice when the east coast users are starting their day around 4-5 AM Redmond time.

o Watch you’re GC/Cache at this time; there still may be some international cached content that’s not being used as Europe users head to bed. If you spend a lot of time in GC during this time; a Scheduled task ran (see above for the bat file and recycle script) can help clear this up.

· By this point you may be asking; ‘Is it worth it’? So far we are happy! Let’s compare the two systems.

· File based: (Before)

o 14 Web services in two data centers (for redundancy)

o RAID-5 with 300-400 gigs of content.

o A few SQL servers; most for personalization.

· New Web/SQL caching system ideally.

o 8 Web Servers in two data centers:

o RAID -5 with only 150 gigs; OS/Rendering code and IIS LOGS

o 4 SQL servers:

· Benefits:

o Save on two web servers.

o Save on terabytes of hard drive space that would have been used on the web boxes. (Minus 4 larger SQL servers)

o Increased our web response time to our customers through caching.

o Building out web servers is simpler and saves time.

o SQL Replication manages all the content updates versus a file based publishing system.

During the Visual Studio Launch last fall, we wanted to know how well our system was performing. We pulled all but 3 web servers from rotation and only one SQL server. MSDN and TechNet traffic was managed without any issues. For network maintenance/failover reasons; we have two clusters with four web servers each and two clusters of SQL servers, two servers each. Considering hosting costs in data centers, reduced web server purchases, extra drives and expansion bays.