Dogfood at home – living the life…


Introduction

For the last few years in the Exchange team, we have embarked on a series of programs intended to bring our engineers closer to our customers and the experience they have with actually using our products.  We’ve been doing a number of different things: holding sessions where we bring in customers to talk to our team about their experiences; hands on labs where we play admin ourselves and provide feedback on our experience; and something we call “Dogfood at Home.”  Dog food, as I’m sure many of you already know, is slang for using your own software, particularly of the beta variety (eating your own dog food).  Dog food at home refers to a project where individuals actually install an Exchange server at home and let friends and family use it for emailing.  My experience doing this is what this blog entry is about.

I have, of course, installed Exchange many times in the ten years I’ve worked in the team, but always as a test bed for playing around with specific features I was developing.  Examples of these include store group operations and sync events, and the Exchange Best Practices Analyzer.  While I could tell you all about the nuts and bolts of the store process (at least, circa E2K), that doesn’t really provide a lot of end-user experience, and I’ve never actually tried to really administer an Exchange system.  And although I’ve been around for some time, most of my direct experience has been in the store code rather than admin, or transport, or client access, so while I had a very high-level understanding of those areas I had never delved too deeply into them.  So in many ways I went into this with less experience than many Exchange administrators have.  But – and this is pretty key – while I knew I didn’t know everything, I did think I knew a lot more than I actually did.  I was willing to play around with things that I shouldn’t have, and I had confidence I could fix most problems without any help.  This puts me at the top of the class when it comes to being a dangerous administrator.  One other thing to note before I really begin is that I was doing a single box deployment with everything (AD, DNS, Exchange Hub, CAS and Mailbox).  This probably isn’t typical (most would employ SBS for this type of thing), so some of my experience may be skewed from the norm a bit.

Installation

Installation and initial configuration mostly went pretty smoothly.  I did run into a few things that seemed a bit annoying, even though I knew the reason why we did things the way we did.  The prereqs (which I had a part in designing and developing), were a little cumbersome since they had to be run multiple times as the issues were resolved, and in one case of a .NET patch being required I actually had to exit setup to allow it to install.  I’ve of course heard about complaints that if we could identify issues, why don’t just automatically correct them, but experiencing it first-hand like this brought the message home much more directly.  However, auto-correction is a much more difficult task than auto-detection (particularly when you are trying to drive the whole process using some kind of simple script).  I will continue to think about what we can do to improve this area in the future (I’ve summarized all my findings at the end).

My first real complication was due to the fact that my router is connected to my ISP with dynamic IP, so I needed to get that taken care of.  This involved registering my external domain name in one place, creating another domain name that could handle dynamic IP remapping and pointing that to my current IP address and then pointing my router back to that site, and finally getting to a DNS server that I could add an MX record to so I could map from one site to the other.  Not having much DNS experience, I took me quite some time to get that all straightened out.  And as it turns out, I made stupid mistakes just about every step of the way.  I didn’t point my external domain to the right DNS server for one, I screwed up the MX record by putting in the wrong destination host name, and finally I messed up what I put in for the accepted domains on the Exchange server.  The upsetting part about all of this is there really isn’t any good way of tracking down these mistakes other than reviewing everything one step at a time.  This is particularly upsetting to me because for the last few years I’ve been involved in our diagnostic tool suite (namely the Exchange Best Practices Analyzer and the Exchange Troubleshooting Assistant). 

So I finally got the outbound mail delivery working, but the next spot of trouble I ran into was getting inbound going.  This turned out to require enabling anonymous users on the receive connectors.  This wasn’t terribly hard to figure out given the errors I was receiving, but the documentation specifically indicates that the receive connectors created out of the box should be configured exactly as needed to work right away.  This is true, but only if you also have an Edge server involved.  Since I’ve got a single box deployment, I don’t have an Edge server.  I decided to give it a shot, and it worked, but the documentation combined with the a little bit of apprehension about whether allowing anonymous users in meant I was opening myself up to attack, made me rather nervous about it.  I finally had to check with other people in the team before I was sure I had done the right thing.  I will be talking to our documentation folks about this area (this is one of the items listed in my findings at the end). 

Next came certificates.  I know a little bit about the nuts and bolts of certificates, but actually having them generated and applying them to a site is new to me.  The documentation seems to contain all the information needed; it’s just that there is a lot of documentation to go through.  I used a self-signed cert at first, and I got that working okay but I really wanted to avoid having my users get warnings when running OWA, so I decided to get a real certificate generated.  My first attempt at this was a mess because it sent the approval request to my dynamic DNS org admin, who never responded.  I then had to step back into the magical world of DNS and add a CNAME record so that my external domain could be accessed through a URL.  I reissued the cert and finally got things working as expected.

The next step was to set up Outlook Anywhere.  On the Exchange side, this was pretty easy to find and do.  The Outlook side was a little more problematic, not knowing where to put the internal name vs the external one, how to setup the Exchange proxy, etc (note: this was for Outlook 2003 – Outlook 2007 makes this much, much easier).  I did finally get it right (with a little help from my friends), but I still haven’t gotten anyone else up on it yet (one person who did try ran into an issue I hadn’t seen).  Another point of confusion on this was that until the certificate was set up correctly, Outlook wouldn’t connect, and there was no good explanation of why that I could find.

Now I had a reasonably fully functional system that I could let people use.  I only had a few friends using it sporadically so far, but I felt it had been a reasonable experience.  There were a few problems and I noted some things that could be improved (which was the main purpose of the exercise), but overall it hadn’t gone too badly.

Then came the calamity.

Big Problems

Everything was running fine for about a week.  I then decided I wanted to install MOM 2005 (Microsoft Operations Manager) and get that experience as well.  While not recommended, I asked around and it sounded like it should probably work okay.  This is even though MOM was 32-bit only and I was running on a 64-bit platform.  I first had to install SQL 2005, and then I installed MOM.  I also installed SQL Reporting Services so that I could install MOM Reporting.  I was able to install all of SQL and most of MOM correctly, after fixing a few MOM prereq issues, except for MOM Reporting.  It kept complaining that it was unable to reach the SQL Reporting web service.  I checked the config and the SQL reporting console, and everything looked like it was there okay. 

I mulled this over for a couple of days, and asked a few people what the problem might be, but I didn’t get anything figured out.  Then I tried to logon to my own mailbox just to check if anyone was having any problems.  I got a server unavailable error.  I soon realized that everyone was getting these errors ever since I installed SQL and MOM.  My friends who were using the system never bothered to tell me about it, they just want back to their old emails temporarily and figured I knew what was going on and was working on fixing it.

So now I’ve got a completely unavailable system (OWA wasn’t working and I hadn’t yet gotten anyone up on Outlook yet).  Even though no one was relying on my system for anything critical, it was still something of a panic-inducing moment.  I’ve had to debug live servers in the past and step through complex code looking for hard to find problems and had hundreds of people screaming at me all the while, but this was somehow different and uniquely unpleasant.  I don’t think I ever fully appreciated the difficulties involved or the skill set needed in being a system administrator before this experience.   The first thing I did was look at the protocol logs for IIS to see if they provided a clue, as well as the event logs.  I soon discovered that the error was coming from the rpcproxy.dll, and it was generating an error 0x0000007E, which I was able to look up and see it was about a module that could not be loaded.  I looked at the dependency chain for the dll, and it indicated it could not find dwmapi.dll (this contains the Desktop Window Manager API’s and is part of Windows).

Okay, I figured I was getting somewhere now.  Somehow, installing MOM and SQL caused this dependency to be added and not resolved (or so I thought).  Strange, but I’ll just find that dll, drop it down, and everything will be fine.  I searched on the web, but no dice.  The 32-bit version was available, but not the 64-bit version.  I had to find someone with a 64-bit machine that had this dll around, and I copied that over.  Problem solved?  Nope.  It was still reporting the same error, but not it was showing that a couple other dlls and undefined imported functions (as it turns out, these were all clues to the problem but essentially red herrings of the first degree).  Well, it was a fun experiment, but I’ll just get rid of MOM and SQL and put things back to normal.  Problem solved now?  Nope.  Still messed up in exactly the same way.  I enlisted some help from others in the team (a nice resource if you can get it) and tried a few things out, but nothing changed anything.  I played with a few settings (dangerous know-it-all admin, remember?), and that didn’t change anything.

Giving up, I called each of my users (all four of them), and asked them if they had any critical mail they needed me to save off because I was giving up and about to rebuild the system from scratch.  I decided to give it one more day.

Finally, someone remembered a kb article about a similar situation in which a MOM 32-bit install flipped IIS into 32-bit mode.  I checked that out, and did the steps needed to put it back in 64-bit mode.  Someone else also suggested checking to make sure ASP.NET 2.0 was enabled for all the Exchange web sites, because they had heard that MOM 2005 might change this as well.  So I checked, and did find they had been reset and needed to be specified again.  I tried it out, and hoorah, it did something!  It got rid of my 401 errors.  Unfortunately, now I was getting 500 errors.  A little more digging in the event log and I found I was getting access denied errors generated by some method somewhere.  So I played around with IIS permissions some more (the same ways I had mucked with trying to fix things in the first place), and finally got OWA back up and running.

Conclusions

So, there are obviously a number of things I learned from this, and there are things we could do better as a product for these types of situations.  Most of these we had already been planning on doing previously.  Understand, however, that due to relative priorities and resources and deadlines, I can’t make any guarantees on when any of these will actually be delivered, but this is what I’ve got:

  • Diagnostics in a cloud.  As useful as ExBPA and ExTRA are, they are still limited by the fact that it can only test things from inside an organization.  The idea here is to have some kind of service running on the Internet that can test a system from an external perspective as well.  This can be used for both monitoring, to alert an admin when problems start occurring, as well as for additional diagnostics and troubleshooting to assist in root cause analysis for any problems that exist.
  • Client troubleshooting.  We have a performance troubleshooter, a mailflow troubleshooter, and a database troubleshooter, but we don’t have any wizard to help troubleshoot client access problems.
  • Documentation improvements.  There are a few things I found in the docs that can be improved:
    • Hubs need Anonymous Users allowed if no Edge servers.  The documentation should at least note this.
  • Prereq improvements.  There are a number of things we can try to do to improve the prereq experience in setup, but we are somewhat limited in our flexibility in this.  Nevertheless, this area is worth more consideration.
  • New ExBPA rules.  A number of new rules suggested themselves over the course of this exercise:
    • Generate a warning if anonymous users are not enabled on hub servers.  This one is a bit tricky, because we can’t programmatically determine if Edge servers are in the system (they are not in Active Directory).  If they are, you definitely don’t won’t anonymous users allowed on the hubs.  If they aren’t, you definitely do what them allowed.
    • Certificate verification improvements.  We already do some certificate validation, but we can do more here.
    • Verify that IIS is in 64-bit mode.  Not much explanation needed here, although this should probably be a part of both ExBPA and the client access troubleshooter. 
    • Verify that Exchange web sites have ASP.NET 2.0 set.  Same as above.
    • Verify that Exchange web sites are configured to use Local System.  Same as above.

All in all, I’ve gotten a lot out of this exercise and I’m very happy with the results.  As I said at the beginning, a number of members of the product team have done the same thing and gained new perspective on their work.  I think you should already have seen some of the results of this in both Exchange 2003 and 2007, and will continue to see more as time goes on.

Jon Avner

Comments (26)
  1. Lukas Beeler says:

    Wow. Great post.

    It’s interesting to see that the gap between administrators and developers widens with every day. I don’t think that that’s a good thing, though.

  2. Chris Warwick says:

    Hi Jon,

    This is an excellent idea!  Glad you got everything working in the end and enjoyed the experience :-)

    Since you’re obviously running against multiple Exchange orgs you will probably find (like me) that it’s a real pain that Outlook (any version) will only allow ONE Exchange connection to be configured.

    Can anything be done about this?  I can have multiple POP/IMAP but only One Exchange.  Worst thing is I can’t even start multiple copies of Outlook with different profiles to connect to different Exchange systems.

    Thanks

    Chris

  3. subsoniq says:

    heh, I’ve been an exchange admin for 7 years and running my own exchange at home (MSDN versions) for about 5 years.  I learned long ago from both work and home experiences to not run anything on an exchange server other than exchange (except of course for the obvious things like MOM agents or Exchange AV).  

    I currently have 2 physical machines running VMWare Server and run separate images for AD DC’s (1 on each machine) and Exchange.  When I get around to rolling out E2K7 I’ll run a separate image for the Edge server and one for the hub/mailbox/CAS (though I might even run CAS on it’s own image).  I also run my linux DNS/firewall servers on images from these machines (dual core rocks!).  I try to have servers/images duplicated on both physical machines in case the hardware on one eats itself, which has happened several times in the past 5 years.

    I think it’s a great idea for the exchange devs to actually run exchange at home, to get a feel for installing and maintaining an exchange environment.  All product dev teams should have to go through something like this if you ask me, it can be a real eye opener to actually have to use the product you make.

  4. JonAv says:

    Regarding Lukas’ comments about the gap between dev and admin: I agree this is a bad thing.  Hopefully, programs like dogfood at home will reduce this gap and reverse the trend.  I think it’s a good start, anyway.

    Regarding Chris’ comments about Outlook only working with one Exchange server, yes this is a pain but I don’t know of any changes in the works to do anything different here.

    Regarding subsoniq’s comments, I have recently reinstalled MOM into a virtual server on my Exchange machine, and that did work much better (once I got DNS set up correctly between the main machine and the virtual image, anyway).

    Thanks for the comments.

  5. Peter O'Dowd says:

    Yes, I know exactly the problem you’re mentioning about installing SQL Reporting Services for MOM.  We ran into exactly the same issue on a dedicated MOM server.  What was your screen resolution?  

    Are you ready for this…  If you run in 800×600 mode then an ‘Apply’ button you require is not visible.  I’d love to meet the dev who wrote that code!

  6. JonAv says:

    I’m not sure what you are referring to regarding the Apply button, but that sounds really bad.  The issue I later ran into when installing MOM on its on VS was that to get the MOM reporting to recognize the SQL reporting vdirs, I needed to enter the specifically even though the defaults were the values that were expected.  I’m not sure why this worked, but it is apparently a known issue.

  7. Greg Lowe says:

    Hey Jon, Great Article. I believe DFAH is a great idea. As both a former developer and former admin, I often credit my success on being able to see the world from both sides. I believe this will help see how the "other half" lives.

    One thing that you neglected to mention when running without an Edge server is the anti-spam features which can be enabled by running a script (in the scripts directory).

    I also have to agree with the need to be able to configure multiple Exchange connections.

  8. Ian Moran says:

    Great article, and very honest. You made the certificate side of things sound easy though – did you install one with multiple common names ? If not presumably you get two warnings when starting Outlook ? There needs to be more documentation around this aspect – you can’t approach certificates in Exchange 2007 the way you did in Exchange 2003. Some aspects of the Cert Manager on the Exchange 2007 box seem un-supported.

    I also ran into exactly the same issue as you where IIS had been set to 32bit – that was a nasty one to recover from but thankfully a Google turned up someone with a similar issue.

    Perhaps Systems Centre Essentials and Systems Operation Manager can be extended to support remote monitoring of a clients Exchange Server with specific regard to mail flow and correct client config – ie is OWA operartional, Sever Activesync, Outlook Anywhere etc etc

  9. Dani says:

    Hey I really enjoy this post! Next time you do testing though,  make sure that you create a Virtual Server for anything that is unrelated to exchange. I have created testing envrionments, running 64-bit windows, exchange 07 & IIS and try not to run anything else on the box… just throw up a virtual server on the base machine and voilla!

  10. JonAv says:

    Regarding Greg’s comment, I have not yet gotten around to setting up anti-spam, but it is on my list.

    Regarding Ian’s comment, I have also not yet added a cert for the AutoDiscover vdir.  I only have one for the owa vdir.  I don’t believe I get any errors or warnings because of this, though (at least, I don’t remmeber seeing any).  I know I don’t have free/busy yet because I can’t connect to AutoDiscover.  I agree setting up a cert is a complex process, and that is something that is being looked at for simplification (certs are pretty complex things to begin with, though).

    Regarding Dani’s comments, I know I should take that to heart, but I did recently install Expressions Web right on the Exchange machine.  Fortunately, that didn’t seem to break anything.

    Note: I will post an update to this blog in a couple of months and let everyone know of my further adventures in administering Exchange.  Also, several others in the team will likely be posting their own experiences soon, so stay tuned for that.

  11. Rich N. says:

    How did you get around the problem of your wife nagging on you that your spending too much time in the basement?!

  12. JonAv says:

    ACtually, we don’t have a basement.  And my wife reads this blog.  

  13. Keif Machado says:

    I too love the idea of "Dogfood at Home".  Keep up the good work and let us know what else happens.

    I do not think that a lot of companies will see the value in enabling the Autodiscover service to the external world.  Plus, it sounds like creating an open hole for not much benefit.  We have been setting up customers with seperate Internal and External vdirs for OWA and other CAS services.  This way we can associate a cert to the necessary vdirs without affecting the internal stuff.  This also reduces the need for a wild card cert with multiple names or SAN’s.

  14. Adam Fazio says:

    This is a great idea an I commend the team for taking it on. I think one of the big lessons as well is using combo-boxes (aka don’t install SQL/MOM on your exchanger server without research, planning, testing, following a standard release process. If you want to combo-box it, use virutal servers, etc)

    Anyway, really cool to turn real world experiences into product improvements!

  15. Aaron Marks says:

    Excellent read and your honesty is definitely appreciated since this is a very similar sitatuation to what us IT Consultants and Admins go through with new products.  Many of us test them the exact same way, and I ran into quite a few of the same issues as you back when I was testing with Beta 2 (have since moved my server to a production environment).

    I think that it is worth mentioning thought that there are easier ways to do this Dynamic DNS.  The best service IMO is from a company by the name of DynDNS.  Since a Dynamic IP is being used though it is obviously not possible to set up a reverse lookup, so you will need to send mail out using another mail server (not using DNS); this can be configured using the send connector.  Then just make sure that the DynDNS client is installed on the server so that it can update all of the necessary hostnames.

    Then for certficates for things like this I always recommend GoDaddy.  GoDaddy certs are only $20 and work with all Microsoft stuff (including WM5 or greater phones).

  16. JonAv says:

    I did use both GoDaddy and DynDNS.  My router supports DDNS, so that made it somewhat easier, but I still managed to screw it up in several different ways.

    Even using those two, you still need to set up your DNS records manually so that mail and client access get properly routed from the GoDaddy domain name to the DynDNS one, right?  Or do you know of an easier way to do this, Aaron?

  17. Lee says:

    Excellent article!  I personally appreciate that you take the time to do something like this.  It’s amazing the things you run into when installing multiple products, especially in larger deployments.  It’s also nice to know I’m not the only one that sees things like this.  :)

  18. JT says:

    Good work.

    I wonder on the actions listed if the approach is a little too ‘reactive’ ?

    Might it be better to take a step back and look at what occurred from a higher level, i.e. you installed something (MOM/SQL), and as a direct result it changed settings in something else (IIS).

    Instead of watching for this specific fault occurring again, why not try and develop something that watches all settings perhaps in the registry?, and reports back on anything that could be deemed a bit unusual or suspicious? Like an advanced event viewer tied into the installation? or a seperate program, you could call it: MS Install Monitor V.1.0  .. heh.

    ok, easier said than done i know .. but i’m wary that tomorrows problems will not be the same, so something that is analysing the system as a whole may point you to the problem area more quickly.

    Similar i guess to the Vista journal that assists trouble shooting by recording all change and associated errors on a timeline. Except this would just be aimed at watching what on the system has changed during the course of an install and rolling up those changes into a summary with warnings of irregularities etc..

    once again, good work all round :o)

  19. JonAv says:

    It’s a good point JT, and at the OS and research level people are looking into this kind of approach.  However, how do you determine when something is unusual or suspicious?  The thing we want to try to do with any analysis program is to keep the signal to noise ratio high (i.e. make as many of the issues generated actionable as possible).  ExBPA, being a highly conditioned set of tests, is pretty good at this.  Any kind of more generic solution you try to apply won’t be as good.  While the brute force method does take a lot of work and requires continuous maintenance to keep it up to date, it is at least pretty certain to be effective, whereas other approaches aren’t so certain.

  20. aaronmarks says:

    Jon, I think I figured out your complication with DNS and what must have been a point of frustration.  You registered two separate domain names, one with GoDaddy, and the other with DynDNS, when in reality you only needed one.  If your original domain name was registered with GoDaddy (I’m assuming this is the case), then you could have just change the Name Server records on GoDaddy’s page under the DNS settings to point to DynDNS’s name servers which are: ns1.mydyndns.org, ns2.mydyndns.org, ns3.mydyndns.org, and ns4.mydyndns.org.

    Once you have delegated your domain to the DynDNS name servers then you can register your Custom DNS with DynDNS.  During registration you just enter in the same domain name that you already have registered over at GoDaddy.  Then in about 15 minutes after registration DynDNS will realize that it has "received delegation" for the domain name.  At this point you can just edit the custom DNS settings, and everything will be simple.  One domain name controlled from one place.

    The only hard part still though, and there is no way around it, is that you can’t register a PTR record.  This means that if you try configuring your Exchange server to use DNS for SMTP that many servers will reject your mail due to a failed reverse lookup.  The workaround for this is to configure the send connector to use your ISP’s SMTP server.  Back about a year ago when I had this setup I just used mail.comcast.net.

    Hope this helps Jon!

  21. JonAv says:

    Thanks Aaron.  I was not aware DynDNS had the capability.  The last bit, though, was something I wanted to avoid.  

  22. aaronmarks says:

    Do you mean that you wanted to avoid rejections due to failed Reverse Lookup? Or, do you mean that you wanted to avoid using your ISP’s SMTP server?

    If it was the later then there are also pay for services that you can relay through, and it is even possible to relay with SASL authentication and to use TLS with Smarthosts.  Credentials are supplied with the get-credential cmdlet I believe.

  23. JonAv says:

    The latter, mainly.  We do have our own relay server we can use at MS (this also functions as our name server which is where I put the MX record).  I’m a little fuzzy now on whether I’m using the mail relay portion or not.  I’ll have to take a look at my config again.

  24. Anonymous says:

    Disk Performance Testing with Jetstress 2007 Installing Exchange 2007 into a Small Business Server 2003

  25. aaronmarks says:

    Jon, you still don’t have a PTR record then.  It is really important that you have a PTR record to email a lot of mail servers.

    Do you understand what I was talking about with the Send Connector and how you need to set it up to relay through another host?  Your hostname is not valid on the internet due to it not having a PTR record (it is not possible with a dyanamic address from your ISP).

  26. David Williams says:

    Great article. I encountered a similar problem – installing BES 4.1.3 on my lovely 64-bit server set IIS back into 32-bit mode. Suddenly OWA no longer worked. Although – like yourself – I was able to eventually work out this had happened and then how to resolve it, It’d be great if Windows would alert that this is taking place at the time.

Comments are closed.