Issue with Federation Only Working in One Direction

I ran into this issue with a couple of customers recently while troubleshooting federation issues.  The issue was that Company A and Company B set up federation with each other.  Both companies could see the presence for the other.  A user in Company A could send an IM to a user in Company B, and the user in Company B can respond.  The issue is if a user in Company B tries to send an IM to a user in Company A.  The user in Company A gets the IM, but when they go to respond, they get a 504 Server time-out error.

Most people would think that this is an issue on Company A's side, when in fact it's really an issue with Company B's Edge server.  If you take a SIPStack trace on Company B's Edge server, you see the following DIAGNOSTIC message:

DIAGNOSTIC: Host name resolution failure

TL_WARN(TF_DIAG) [5]0BAC.00F0::03/24/2010-18:37:05.804.00f24a97 (SIPStack,SIPAdminLog::TraceDiagRecord:1224.idx(142))$$begin_record
LogType: diagnostic
Severity: warning
Text: Host name resolution failure
SIP-Start-Line: INFO sip:<SIP URI>;opaque=user:epid:5OH1BQ4u2VCvDv3KnnBaowAA;gruu SIP/2.0
SIP-Call-ID: 525537919d6b4d189f33d21485855115
SIP-CSeq: 1 INFO
Data: fqdn=" <Front-end server FQDN> "
$$end_record

In Data: fqdn=" <Front-end server FQDN> ", <Front-end server FQDN> is one of Company A's front-end servers.  The Edge server is trying to do a DNS lookup and is failing.  This is either because the front-end servers are not listed in the Edge server's hosts file, or the Edge server does not have access to internal DNS.

We solved this issue by adding the front-end servers to the hosts file on the Edge servers.  Once we did that, both sides could send and receive IMs with each other.