Me Too!

One way of telling how long a Microsoft employee has been working here is their reaction to the phrase “Bedlam DL3”. Just for grins, I was at lunch in the cafeteria with a bunch of co-workers and I blurted out, totally out of context: “Bedlam DL3”.  About 3 of the old-timers in the group responded, in chorus “Me Too!”

So why does everyone know about this rather mysterious phrase?

Well, Microsoft’s a pretty big organization.  We’ve got well over 100,000 mailboxes in our email infrastructure, and at times it can become rather cumbersome to manage all these.  One of the developers in our Internal Technologies Group (also known as ITG, basically the MIS department at Microsoft) was working on a new tool to manage communications with the various employees at Microsoft, and as a part of this tool, he created several distribution lists.  Each distribution list had about a quarter of the mailboxes in the company on it (so there were about 13,000 mailboxes on each list).  For whatever reason, the distribution lists were named “Bedlam DL<n>” (maybe the tool was named Bedlam?  I’m not totally sure).

Well the name of the lists certainly proved prophetic.

It all started one morning when someone looked at the list of DL’s they were on, and discovered that they were on this mysterious distribution list called “Bedlam DL3”.  So they did what every person should do in that circumstance (not!).

They sent the following email:

To:   Bedlam DL3
From: <User>
Subject: Why am I on this mailing list?  Please remove me from it.

Remember, there are 25,000 people on this mailing list.  So all of a sudden, all 25,000 people received the message.  And almost to a person, they used the “reply-all” command and sent:

To:   Bedlam DL3
From: <User>
Subject: RE: Why am I on this mailing list?  Please remove me from it.
Me too! 

In addition, there were some really helpful people on the mailing list too:  They didn’t respond with just “Me Too!”  They responded with:

To:   Bedlam DL3
From: <User>
Subject: RE: Why am I on this mailing list?  Please remove me from it.
Stop using reply-all – it bogs down the email system. 

You know what?  They were right – the company’s email system did NOT deal with this gracefully.

Why?  Well, you’ve got to know a bit more about how Exchange works internally. 

First off, the original mail went to 13,000 users.  Assuming that 1,000 of those 13,000 users replied, that means that there are 1,000 replies being sent to those 13,000 users.  And it turns out that a number of these people had their email client set to request read receipts and delivery receipts.  Each read and delivery receipt causes ANOTHER email to be sent from the recipient back to the sender (all 13,000 recipients).  Assuming that 20% of the 1,000 users replying had read receipts or delivery receipts set, that meant that every one of the message that they sent caused another message to be sent for every one of the 13,000 recipients. So how many messages were sent?

First there were the basic messages – that’s 13,000,000 messages.
Next there were the receipts – 200 users, 13,000 receipts – that’s and additional 2,600,000 messages.
So about 15.5 MILLION messages were sent through the system.  In about an hour.

So at a minimum, 15,600,000 email messages will be delivered into peoples mailboxes.  But Exchange can handle 15,600,000 email messages EASILY.  There’s another problem that’s somewhat deeper.

An Exchange email message actually has TWO recipient lists – there’s the recipient list that the user sees in the To: line on their email message. This is called the P2 recipient list. This is the recipient list that the user typed in. There’s also a SECOND recipient list, called the P1 recipient list that contains the list of ACTUAL recipients of the message. The P1 recipient list is totally hidden from the user, it’s used by the MTA to route email messages to the correct destination server.

Internally, the P1 list is kept as the original recipient list, plus all of the users on the destination servers.  As a result, the P1 list is significantly larger than the P2 list.

For the sake of argument, let’s assume that 10% of the recipients on each message (130) are on each server. So each message had 100 recipients in the P1 header, plus the original DL. Assuming 100 bytes per recipient email address, this bloats each email message by 13K. And this assumes that there are 0 bytes in the message – just the headers involve 13K.

So those 15,000,000 email messages collectively consumed 195,000,000,000 bytes of bandwidth. Yes, 195 gigabytes of bandwidth bouncing around between the email servers.

Compounding this problem was a bug in the MTA that caused the MTA to crash that occurred only when it received a message with more than 8,000 recipients. But it crashed only AFTER processing up to 8,000 recipients. So 8,000 of the 13,000 recipients of the message would get it and 5,000 wouldn’t. When the MTA was restarted, it would immediately start processing the messages in its queue – and since the messages hadn’t been delivered yet, it would retry to deliver the message, sending to the SAME 8,000 recipients and crashing. And because of the way the Exchange store interacts with the MTA, even if we shut down the MTA, the messages would still queue up waiting on delivery to the MTA –shutting down the MTA wouldn’t fix the problem, it would only defer the problem (since the message store would immediately start delivering the queued messages into the MTA the second the MTA came back up).

So what did we do to fix it? Well, the first thing that we did was to fix the MTA. And we tried to scrub the MTA’s message queues. This helped a lot, but there were still millions of copies of this message floating around the system.

It took about 2 days of constant work before the email system recovered from this one. When it was over, the team firefighting the crisis had t-shirts made with “I survived Bedlam DL3” on the front and “Me Too! (followed by the email addresses of everyone who had replied)” on the back.

To prevent anything like this happening in the future, we added a message recipient limit to Exchange – the server now has the ability to enforce a site-wide limit on the number of recipients in a single email message, which neatly prevents this from being a problem in the future.

Larry Osterman

Comments (29)
  1. Simon says:

    I’m curious why you don’t use a more relational system for storing the e-mail content in exchange…

    Surely you could store a message sent using exchange that went to X users on y servers just y times – or once if they all used the same storage?

    You would only need to store the details about who it was delivered to, resulting in a performance increase an order of magnitude as you’d just be saying "this user got a copy of this message". You wouldn’t need to store a P1 header except for those users outside of the exchange environment – the list of who got the message is available in the datastore already. Deleting messages becomes as easy as altering a record in the DB to say that it’s deleted and if all references to a message are deleted add it to a list of messages to purge from the system whenever a purge is done.

    It would result in dramatically less use of storage, less inter-server bandwidth and faster message access.

    To support saving changes to the message you could easily either create a new base message or a delta based on the original – which would again be more efficient…

  2. Larry Osterman says:

    Umm.. That’s actually the way that the messages are stored internally. Inside the message store, each of those messages to the 100 recipient in each store only occupies a single row in the underlying database.

    But even though Exchange is a REALLY good email system, I don’t know ANYONE who would recommend that you put all 55,000 Microsoft employee’s on the same email server, especially back in the Exchange 5.5 days. At an absolute minimum, this single server would represent a massive single point of failure for the entire corporate email system.

    There are multiple servers, and the Bedlam DL3 distribution list went to users on several servers. And as long as the message had to go to multiple servers…

  3. Omer van Kloeten says:

    Me Too!

    We had the same thing with our distribution lists a few years ago. We had about five of them for the whole organization (thousands or addresses all together).

    Someone decided to send a presentation file to all the lists which regarded the entire organization. One guy failed to open the attachement (the security setting was too high) and replied to all saying "I didn’t get the file".

    This was replied by a lot of people replying to all with messages like "Me too!", "Stop emailing me!", "What’s up, everyone?", "X Y, I know you’re responsible for this!" (X Y being a guy’s name (who had nothing to do with the matter)), "All of you will be fired" (This was sent by one of the high level bosses), etc.

    Eventually, they removed everyone’s rights to send mail to these lists except for admins…

    Another way of dealing with these kind of things… :)

  4. Anonymous Coward says:

    I have three questions on this topic:

    1) The new Exchange Permissions system (circa Office XP) has a "Do Not Forward" option but not a "Do Not Reply All" one. In my opinion the latter would be infinitely more useful at curbing embarassment, confusion and unnecessary churn in day-to-day communications. Please add it! :-)

    2) Why can’t Outlook warn me, e.g. "This message will be sent to more than 1000 users. Are you sure you want to continue?" when I click send? It would make a lot of people think twice. (Don’t answer that, I know *why*… nested DLs and private members, etc. I still want the feature though. You’re Microsoft, make it happen).

    3) Why weren’t those distribution lists locked down so that only their owners could send mail to them? This is a feature in Exchange, right? And is it also a feature that only members of a DL can send mail to it? I’d hope so, but too often I see external Internet email coming to internal DLs. It seems like this functionality should be off by default.

  5. B.Y. says:

    Well, that was a pretty good test of how Exchange handles heavy loads.

    You guys should test it this way more often, requiring everyone to reply-all "me too" at least twice in each test.

  6. KC Lemson says:

    We do a massive amount of scalability testing with millions of messages going back and forth, no worries there =)

    (Yes, I know you’re joking, but I wanted to say it anyway :-)

  7. Larry Osterman says:

    AC: #1: Rights Management (Exchange permissions system) isn’t about preventing people from screwing up, Exchange has had lots of other mechanisms to prevent that (like marking DL’s as restricted) since Exchange 4.0 shipped. The Exchange right management stuff is about privacy, not about preventing user mistakes.

    #2: The problem is that the Exchange client can’t know this. There are two aspects to this: First, the DL in question might be in a different forest, in which case it’s just a custom recipient in the local forest – there’s no DL membership to look at. The other reason is that DL membership can be restricted – Outlook can’t see the membership list, so it can’t tell how many users are on it.

    #3: That is exactly the user error that happened – the developer writing the application forgot to lock the DL down and bedlam broke out. And reversing the defaults isn’t a good idea IMHO – that would discourage people from using DL’s and create a support nightmare – Imagine the number of calls we’d get from frustrated Exchange Administrators:

    "I just created this distribution list, but my users can’t send mail to it!"

    The bottom line is that there’s no good answer to this. If we WERE to change the default, then the 95% case would be made harder (administrators would have to check a "allow users to send to this DL" checkbox). And since most of the time users want to be able to post to DL’s, administrators would get in the habit of checking the box every time they created a DL.

    The bottom line is that if it’s not a security risk (and this isn’t), chosing the default be the one that users almost always want to do is best (IMHO).

  8. Scott says:

    Far be it from me to question you on a tech related matter, especially with Exchange, but #2 above doesn’t seem to ring true to me.

    "#2: The problem is that the Exchange client can’t know this. There are two aspects to this: First, the DL in question might be in a different forest, in which case it’s just a custom recipient in the local forest – there’s no DL membership to look at. The other reason is that DL membership can be restricted – Outlook can’t see the membership list, so it can’t tell how many users are on it. "

    If I right click on the DL I can select "properties" and view the members of the list can’t I?

  9. Matt Warren says:

    This was the underlying joke from my [url=";]Asteroid on Collision Course[/url] post.

    BEDLAM DL3 Rocks!

  10. KC Lemson says:

    Scott – yes, in many cases that’s true. But in some cases such as the two that Larry mentioned (hidden membership and cross-forest), that’s not possible. So perhaps it’s best rephrased as "you can’t *guarantee* that the client would know this."

    Technically it’s certainly possible to find a way to make this work – but given enough code & testing, just about anything is :-)

  11. Simon says:

    Thanks for that Larry, not being an exchange admin I got the impression your mail storage was more along the lines of a traditional ‘copy for each user’ from:

    "So those 15,000,000 email messages collectively consumed 195,000,000,000 bytes of bandwidth."

    And assumed that if it sent it more than once, it would store it more than once.

  12. Larry Osterman says:

    The "bandwidth" in this case was the number of bytes of data being transferred around between the exchange servers.

    For an article on how Exchange keeps it’s documents, check out (Google is my friend, this was the first link I hit :)):

  13. Anonymous says:

    It all started one morning when someone looked at the list of DLs they were on, and discovered that they were on this mysterious distribution list called Bedlam DL3. So they did what every person should do in that circumstance…

  14. Anonymous says:

    Take Outs for 8 April 2004

  15. ANGIE says:


  16. Steve says:

    While not as intensive nor intrusive as this, we recently ran an exercise against our users, warning them of a potential virus that was incoming, describing it as coming from "Super-User <>", the subject line, and sample message text, and advising them not to open it (and if they did open it, not to click on the attachment).

    When I created the email, the return address shown (via OE) was indeed, but the reply to: address was the entire organization’s DL. (FYI: I created an HTML email that included an image from my webserver so I could read the logs to see who opened it, and included a readme.txt-space-space-space-space-space…space.html file with META redirect tags to an internal webserver’s page that said "Yeah, you shouldn’t have done that…" – and I could check the same logs to see who clicked the attachment.)

    Suckers, umm, "errant users" that opened it, realized they screwed up, and wanted to chew me out, replied to the whole organization (4500+ on the DL) (rather than the fictious "root" account) and were promptly embarassed when everybody read their nasty comments.

    A good time was had by all in our division. Senior management was not really amused, probably because they were up there in the top offenders list.

    (To keep this moderately on-topic – no noticable impact to my six Exchange 5.5 servers, as the majority of good users actually deleted the mail upon receipt, and emptied their deleted items folder as well, which is something they hardly ever do!)

  17. I was there says:

    A minor correction: the T-shirt was created by a victim of Bedlam and was sold to raise money for charity (it was around the time of Giving Campaign). I still have mine. And I still remember those two wonderful days where I got NO email at all. Nothing.

  18. Reeves Little says:

    Please take me off this list.

    (Bedlam Vetran ;)

  19. Reeves Little says:

    Please take me off this list.

    (Bedlam Vetran ;)

  20. Anonymous says:

    Please stop replying to the new mailing list you were added to to ask why you were added. If the 40+ messages in your inbox from other confused coworkers haven’t made this abundantly clear, nobody knows. And we’re all sick of hearing about it. There are over 3,000 of us. I fail to understand how this sort of thing happens. This is almost 2005! Have you never used e-mail before? Do you not understand that a ton of us are…

  21. Anonymous says:

    Microsoft’s Bedlam DL3 mailing list ordeal…

  22. Anonymous says:

    ???????? ?? ???????????????? ?????????? ????????… :: Me Too!

  23. Anonymous says:

    Me Too! This is some funny stuff….

  24. Anonymous says:

    as in "magnum" Today I finished my important-and-urgent list early, and moved right onto my important-but-not-urgent list. At the top of that is to watch some Channel9 videos already. It turned into a day of sleuthing, but never fear, my…

  25. Anonymous says:

    So, I am on a few distribution lists that pertain directly to my every day job.&amp;nbsp; I have a few others…

  26. Anonymous says:

    Well, this year I didn’t miss the anniversary of my first blog post.

    I still can’t quite believe it’s…

  27. Anonymous says:

    I was looking at Larry’s anniversary blog post, and that Bedlam DL3 link brought back some memories….

  28. Anonymous says:

    If you don’t know what Bedlam DL3 means, I encourage you to read this blog entry You Had Me At EHLO…

Comments are closed.