Last week (Feb 20-24) we ran into a few problems with 2 of the three partners - MSN and Yahoo!, I will give the details I can of them as well as request some input from you about improvements.
Background - all PIC issues, that is communication to and from a corporate domain to any of the 3 clouds always come to Microsoft support first. The reason is to be sure that it is not something with the customer LCS deployment or configuration and to also buffer the 3 partners from getting questions they are not able to provide support for.
Last week we had MSN be impacted by a change of names by the Root Certificate Authority. So for anyone who had a certificate from Equifax they had to get a new Certificate and Certificate Chain. For anyone that communicates to a site with that authority you would need the new certificate chain also. Any workstation or server with the ability to get to Windows update is ok, but that Access Proxy is in a DMZ usually and only requires 5061 for connections and thus would not get the updated certificate. One of the other partners had this same problem but at the moment of typing I can't remember who.
There was also an upgrade to the MSN Address Book environment which created the problem I mentioned in the Provisioning post - unable to add user to address book. So anyone who tried to add email@example.com from an MSN client would get that error, however they could still have an IM conversation. This was researched and resolved Thursday night, Friday morning. While I have not been provided root cause I am aware that there was an upgrade to the Address Book environment so we knew what the "last change" was we always customers!
On the Yahoo! side we had lots of reports from customers that were not seeing presence changes or incorrect presence. While there is report that on Friday this was resolved we are still investigating the issue for a few customers still reporting problems. Again I do not have root cause but what I do know is that some additional servers were added to the environment but not added to the list of servers (see even the best of us can forget all the steps necessary) and there was also a value for the subscription time out that was configured for a very high setting.
So, during this time we were sort of scrambling to realize this was happening (more customers calling and reporting) and to try and report them correctly. We obviously don't have the best process for alerting everyone to a situation like this and we are discussing this now. Your ideas would be welcomed.
Tom LCS Kid