I recently ran into an issue where all attempts to run any cluster command from the command line resulted in an authentication failure. We were able to connect to the cluster from the GUI and powershell however all attempts to connect to the cluster from the command line falied with that simple error – Authentication failure.
I was pretty stumped on this as I could not understand why this failure occured only on the command line but not via the GUI or powershell. A look at the technet article here http://technet.microsoft.com/en-us/library/cc719008.aspx#BKMK_Firewall shows that there are specific ports used for communication between the cluster services on the head node and compute nodes. As an example, the command line tools uses port 5800 for communication with the HPC Job Scheduler Service on the head node, and port 5969 is used by the client tools on the enterprise network to connect to the HPC Job Scheduler Service on the head node. If you’re having trouble communicating to the Job scheduler services on the head node it is always a good idea to investigate which process is listening on which port. A useful tool to accomplish this is netstat.exe.
Running netstat -ano displays all connections and listening ports, addresses and port numbers in numerical form and the owning process PID is listening on each port connection. Compare this with the output from tasklist.exe and you can pretty much figure out which process is listening on which port.
Doing this in my scenario revealed that a different application (VNC Server) was listening on port 5800 and as a result, the command line interface was unable to connect to the scheduler service on that port. The solution to this was simply to reconfigure the VNC application to listen on a different port and then restart the HPC Job Scheduler service.
After this the command line interface to the job scheduler was working well just as expected.
I hope someone out there in Windows HPC land finds this post helpful someday.