I don't have all the answers but what we can do is get you started down the correct route to figuring out what is going on. This is what I recommend you gather when you have a lock that is interrupting users productivity
- Name of Proc or Process, and database that are being affected
- Gather SQLDiag output using SD_Detailed.XML (http://blogs.msdn.com/b/grahamk/archive/2008/04/14/performance-problems-in-sql-server-what-data-should-i-collect.aspx)
- Turn on trace flag 1222 and 1204 and collect errorlog.
- Gather ULS logs from each server
Based on this data we should be able to figure out what was happening during the lock, and then put together an action plan to resolve the issue.
I also recommend that ahead of time we set the following Categories under Diagnostic Logging to Verbose
- Timer Job