IMPORTANT ANNOUNCEMENT FOR OUR READERS!
AskPFEPlat is in the process of a transformation to the new Core Infrastructure and Security TechCommunity, and will be moving by the end of March 2019 to our new home at https://aka.ms/CISTechComm (hosted at https://techcommunity.microsoft.com). Please bear with us while we are still under construction!
We will continue bringing you the same great content, from the same great contributors, on our new platform. Until then, you can access our new content on either https://aka.ms/askpfeplat as you do today, or at our new site https://aka.ms/CISTechComm. Please feel free to update your bookmarks accordingly!
Why are we doing this? Simple really; we are looking to expand our team internally in order to provide you even more great content, as well as take on a more proactive role in the future with our readers (more to come on that later)! Since our team encompasses many more roles than Premier Field Engineers these days, we felt it was also time we reflected that initial expansion.
If you have never visited the TechCommunity site, it can be found at https://techcommunity.microsoft.com. On the TechCommunity site, you will find numerous technical communities across many topics, which include discussion areas, along with blog content.
NOTE: In addition to the AskPFEPlat-to-Core Infrastructure and Security transformation, Premier Field Engineers from all technology areas will be working together to expand the TechCommunity site even further, joining together in the technology agnostic Premier Field Engineering TechCommunity (along with Core Infrastructure and Security), which can be found at https://aka.ms/PFETechComm!
As always, thank you for continuing to read the Core Infrastructure and Security (AskPFEPlat) blog, and we look forward to providing you more great content well into the future!
Just in case if you haven’t seen this series, I’ve been writing an ADFS Deep-Dive series for the past 10 months. Here are links to the previous articles:
- ADFS Deep-Dive- Primer
- ADFS Deep-Dive- Comparing WS-Fed, SAML, and OAuth
- ADFS Deep Dive- Planning and Design Considerations
- ADFS Deep Dive- Certificate Planning
- ADFS Deep-Dive- Onboarding Applications
Before you start troubleshooting, ask the users that are having issues the following questions and take note of their answers as they will help guide you through some additional things to check:
- Where are you when trying to access this application? At home? Office?
- Are you connected to VPN or DirectAccess?
- Can you log into the application while physically present within a corporate office?
- What browser are you using?
- How are you trying to authenticating to the application? Username/password, smartcard, PhoneFactor?
If you’re not the ADFS Admin but still troubleshooting an issue, ask the ADFS administrators the following questions:
- Is the problematic application SAML or WS-Fed?
- Who is responsible for the application? Someone in your company or vendor?
- Is the issue happening for everyone or just a subset of users?
- Can you get access to the ADFS server’s and Proxy/WAP event logs?
First, the best advice I can give you for troubleshooting SSO transactions with ADFS is first pinpoint where the error is being throw or where the transaction is breaking down. Is the transaction erroring out on the application side or the ADFS side? That will cut down the number of configuration items you’ll have to review. The user won’t always be able to answer this question because they may not be able to interpret the URL and understand what it means. Grab a copy of Fiddler, the HTTP debugger, which will quickly give you the answer of where it’s breaking down:
Make sure to enable SSL decryption within Fiddler by going to Fiddler options:
Then “Decrypt HTTPS traffic”. I also check “Ignore server certificate errors”
Warning: Fiddler will break a client trying to perform Windows integrated authentication via the internal ADFS servers so the only way to use Fiddler and test is under the following scenarios:
- The user that you’re testing with is going through the ADFS Proxy/WAP because they’re physically located outside the corporate network.
- You have hardcoded a user to use the ADFS Proxy/WAP for testing purposes.
- The application is configured to have ADFS use an alternative authentication mechanism.
- ADFS is hardcoded to use an alternative authentication mechanism than integrated authentication.
- You have disabled Extended Protection on the ADFS servers, which allows Fiddler to continue to work during integrated authentication. This is not recommended.
The classic symptom if Fiddler is causing an issue is the user will continuously be prompted for credentials by ADFS and they won’t be able to get past it.
If you recall from my very first ADFS blog in August 2014, SSO transactions are a series of redirects or HTTP POST’s, so a fiddler trace will typically let you know where the transaction is breaking down.
Frame 1: I navigate to https://claimsweb.cloudready.ms. It performs a 302 redirect of my client to my ADFS server to authenticate.
Frame 2: My client connects to my ADFS server https://sts.cloudready.ms. My client submits a Kerberos ticket to the ADFS server or uses forms-based authentication to the ADFS WAP/Proxy server.
Frame 3: Once I’m authenticated, the ADFS server send me back some HTML with a SAML token and a java-script that tells my client to HTTP POST it over to the original claims-based application – https://claimsweb.cloudready.ms.
Frame 4: My client sends that token back to the original application: https://claimsweb.cloudready.ms. Claimsweb checks the signature on the token, reads the claims, and then loads the application.
Just remember that the typical SSO transaction should look like the following:
- Initial request to application.
- Redirect to ADFS for authentication
- User sent back to application with SAML token.
Identify where the transaction broke down – On the application side on step 1? When redirected over to ADFS on step 2? Or when being sent back to the application with a token during step 3? Also, to make things easier, all the troubleshooting we do throughout this blog will fall into one of these three categories.
All the things we go through now will look familiar because in my last blog, I outlined everything required by both parties (ADFS and Application owner) to make SSO happen but not all the things in that checklist will cause things to break down. Consequently, I paired that list down only to the items that will break SSO and then reorganized them into the above troubleshooting categories and we’re now going to step through each:
1.) The SSO Transaction is Breaking during the Initial Request to Application
If the transaction is breaking down when the user is just navigating to the application, check the following:
Is RP Initiated Sign-on Supported by the Application?
If the transaction is breaking down when the user first goes to the application, you obviously should ask the vendor or application owner whether there is an issue with the application. But if you find out that this request is only failing for certain users, the first question you should ask yourself is “Does the application support RP-Initiated Sign-on?”
I know what you’re thinking, “Why the heck would that be my first question when troubleshooting?” Well, sometimes the easiest answers are the ones right in front of us but we overlook them because we’re super-smart IT guys. You know as much as I do that sometimes user behavior is the problem and not the application. J
If the application doesn’t support RP-initiated sign-on, then that means the user won’t be able to navigate directly to the application to gain access and they will need special URL’s to access the application. Ask the user how they gained access to the application? Through a portal that the company created that hopefully contains these special URL’s, or through a shortcut or favorite in their browser that navigates them directly to the application. Doh! If they answer with one of the latter two, then you’ll need to have them access the application the correct way – using the intranet portal that contains special URL’s. It’s often we overlook these easy ones.
There can obviously be other issues here that I won’t cover like DNS resolution, firewall issues, etc.
2.) The SSO Transaction is Breaking when Redirecting to ADFS for Authentication
If the transaction is breaking down when the user is redirected to ADFS for authentication, then check the following items:
Is the ADFS Logon URL correctly configured within the application?
Many applications will be different especially in how you configure them. Some you can configure for SSO yourselves and sometimes the vendor has to configure them for SSO. Consequently, I can’t recommend how to make changes to the application, but I can at least guide you on what might be wrong. If the application is redirecting the user to the wrong URL, that user will never authenticate against ADFS and they’ll receive an HTTP 404 error – Page not found. This should be easy to diagnose in fiddler. Just look what URL the user is being redirected to and confirm it matches your ADFS URL.
Also make sure that your ADFS infrastruce is online both internally and externally. Test from both internal and external clients and try to get to https://<sts.domain.com>/federationmetadata/2007-06/federationmetadata.xml.
Key Takeaway: Regardless of whether the application is SAML or WS-Fed, the ADFS Logon URL should be https://<sts.domain.com>/adfs/ls with the correct WS-FED or SAML request appended to the end of the URL.
Is a SAML request signing certificate being used and is it present in ADFS? (Optional)
How do you know whether a SAML request signing certificate is actually being used. Well, look in the SAML request URL and if you see a signature parameter along with the request, then a signing certificate was used:
Now check to see whether ADFS is configured to require SAML request signing:
Get-ADFSRelyingPartyTrust –name “shib.cloudready.ms”
By default, relying parties in ADFS don’t require that SAML requests be signed. Although it may not be required, let’s see whether we have a request signing certificate configured:
Even though the configuration isn’t configured to require a signing certificate for the request, this would be a problem as the application is signing the request but I don’t have a signing certificate configured on this relying party application. If the application is signing the request and you don’t have the necessary certificates to verify the signature, ADFS will throw an Event ID 364 stating no signature verification certificate was found:
Key Takeaway: Make sure the request signing is in order. It isn’t required on the ADFS side but if you decide to enable it, make sure you have the correct certificate on the RP signing tab to verify the signature. You would need to obtain the public portion of the application’s signing certificate from the application owner.
Is the Request Signing Certificate passing Revocation?
Also, ADFS may check the validity and the certificate chain for this request signing certificate. This configuration is separate on each relying party trust. To check, run:
Get-adfsrelyingpartytrust –name <RP Name>
You can see here that ADFS will check the chain on the request signing certificate. If you would like to confirm this is the issue, test this settings by doing either of the following:
1.) Temporarily Disable Revocation Checking entirely and then test:
Set-adfsrelyingpartytrust –targetidentifier “https://shib.cloudready.ms” –signingcertificaterevocationcheck “None”
2.) Or export the request signing certificate run certutil to check the validity and chain of the cert:
certutil –urlfetch –verify c:\requestsigningcert.cer
I even had a customer where only ADFS in the DMZ couldn’t verify a certificate chain but he could verify the certificate from his own workstation. You can imagine what the problem was – the DMZ ADFS servers didn’t have the right network access to verify the chain.
Is the application sending the right identifier?
If the application does support RP-initiated sign-on, the application will have to send ADFS an identifier so ADFS knows which application to invoke for the request. The methods for troubleshooting this identifier are different depending on whether the application is SAML or WS-FED. We need to ensure that ADFS has the same identifier configured for the application.
From fiddler, grab the URL for the SAML transaction; it should look like the following:
See that SAMLRequest value that I highlighted above? Its base64 encoded value but if I use SSOCircle.com or sometimes the Fiddler TextWizard will decode this:
Select Redirect and then click decode:
If it doesn’t decode properly, the request may be encrypted. If it does decode property, if we click on the XML View, it should look like this:
Here you can see where my relying party trust has the same identifier as the value in the SAML request so the identifier the application is sending us here is fine:
If the identifier is wrong, I’m not even given a chance to provide my credentials and am immediately given an error screen:
I captured the following URL from Fiddler during a WS-FED transaction with the wrong identifier being passed from the application:
If you URL decode this highlighted value, you get https://claims.cloudready.ms. And you can see that ADFS has a different identifier configured:
Another clue would be an Event ID 364 in the ADFS event logs on the ADFS server that was used stating that the relying party trust is unspecified or unsupported:
Key Takeaway: The identifier for the application must match on both the application configuration side and the ADFS side. Look for event ID’s that may indicate the issue. If you don’t have access to the Event Logs, use Fiddler and depending on whether the application is SAML or WS-Fed, determine the identifier that the application is sending ADFS and ensure it matches the configuration on the relying party trust.
Is the correct Secure Hash Algorithm configured on the Relying Party Trust?
This one typically only applies to SAML transactions and not WS-FED. In the SAML request below, there is a sigalg parameter that specifies what algorithm the request supports:
If we URL decode the above value, we get:
In this instance, make sure this SAML relying party trust is configured for SHA-1 as well:
Is the Application sending a problematic AuthnContextClassRef?
In this case, the user would successfully login to the application through the ADFS server and not the WAP/Proxy or vice-versa. Ultimately, the application can pass certain values in the SAML request that tell ADFS what authentication to enforce. ADFS and the WAP/Proxy servers must support that authentication protocol for the logon to be successful. The following values can be passed by the application:
I copy the SAMLRequest value and paste it into SSOCircle decoder:
The click XML View:
The highlighted value above would ensure that users could only login to the application through the internal ADFS servers since the external-facing WAP/Proxy servers don’t support integrated Windows authentication. You would also see an Event ID 364 stating that the ADFS and/or WAP/Proxy server doesn’t support this authentication mechanism:
Is there a problem with an individual ADFS Proxy/WAP server?
This one only applies if the user responded to your initial questions that they are coming from outside the corporate network and you haven’t yet resolved the issue based on any of the above steps. There are known scenarios where an ADFS Proxy/WAP will just stop working with the backend ADFS servers. Look at the following on all ADFS Proxy/WAP servers:
- ADFS event logs for errors or warnings,
- Make sure the ADFS service is running.
- Make sure the Proxy/WAP server can resolve the backend ADFS server or VIP of a load balancer.
- Obviously make sure the necessary TCP 443 ports are open. 🙂
- Are you using a gMSA with WIndows 2012 R2? There is a known issue where ADFS will stop working shortly after a gMSA password change. The following update will resolve this: http://support.microsoft.com/en-us/kb/3032590
- There are some known issues where the WAP servers have proxy trust issues with the backend ADFS servers:
3.) The SSO Transaction is Breaking when the User is Sent Back to Application with SAML token
Many of the issues on the application side can be hard to troubleshoot since you may not own the application and the level of support you can with the application vendor can vary greatly. With all the multitude of cloud applications currently present, I won’t be able to demonstrate troubleshooting any of them in particular but we cover the most prevalent issues.
Is the URL/endpoint that the token should be submitted back to correct?
If the user is getting error when trying to POST the token back to the application, the issue could be any of the following:
- The endpoint on the relying party trust in ADFS could be wrong.
- The endpoint on the relying party trust should be configured for POST binding
If you suspect either of these, review the endpoint tab on the relying party trust and confirm the endpoint and the correct Binding (POST or GET) are selected:
- The client may be having an issue with DNS
- The application endpoint that accepts tokens just may be offline or having issues. Contact the owner of the application.
Is the Token Encryption Certificate configuration correct? (Optional)
This one is hard to troubleshoot because the application will enforce whether token encryption is required or not and depending on the application, it may not provide any feedback about what the issue is. If you suspect that you have token encryption configured but the application doesn’t require it and this may be causing an issue, there are only two things you can do to troubleshoot:
- Ask the owner of the application whether they require token encryption and if so, confirm the public token encryption certificate with them. Don’t compare names, compare thumbprints.
- It’s very possible they don’t have token encryption required but still sent you a token encryption certificate. Remove the token encryption certificate from the configuration on your relying party trust and see whether it resolves the issue. You may encounter that you can’t remove the encryption certificate because the remove button is grayed out. The way to get around this is to first uncheck “Monitor relying party”:
To ensure you have a backup of the certificate, export the token encryption certificate first by View>Details>Copy to File. Then you can remove the token encryption certificate:
Now test the SSO transaction again to see whether an unencrypted token works.
Is the Token Encryption Certificate passing revocation?
Also, ADFS may check the validity and the certificate chain for this token encryption certificate. This configuration is separate on each relying party trust. To check, run:
Get-adfsrelyingpartytrust –name <RP Name>
You can see here that ADFS will check the chain on the token encryption certificate. If you would like to confirm this is the issue, test this settings by doing either of the following:
3.) Temporarily Disable Revocation Checking entirely
Set-adfsrelyingpartytrust –targetidentifier “https://shib.cloudready.ms” –encryptioncertificaterevocationcheck “None”
4.) Or run certutil to check the validity and chain of the cert:
certutil –urlfetch –verify c:\users\dgreg\desktop\encryption.cer
Does the application have the correct token signing certificate?
This one is hard to troubleshoot because the transaction will bomb out on the application side and depending on the application, you may not get any good feedback or error messages about the issue.. Just make sure that the application owner has the correct, current token signing certificate. Confirm the thumbprint and make sure to get them the certificate in the right format – .cer or .pem.
Here is a .Net web application based on the Windows Identity Foundation (WIF) throwing an error because it doesn’t have the correct token signing certificate configured:
Does the application have the correct ADFS identifier?
When this is misconfigured, everything will work until the user is sent back to the application with a token from ADFS because the issuer in the SAML token won’t match what the application has configured. At that time, the application will error out. Applications based on the Windows Identity Foundation (WIF) appear to handle ADFS Identifier mismatches without error so this only applies to SAML applications. The default ADFS identifier is:
Notice there is no HTTPS. Confirm what your ADFS identifier is and ensure the application is configured with the same value:
What claims, claim types, and claims format should be sent? (Optional)
This one is nearly impossible to troubleshoot because most SaaS application don’t provide enough detail error messages to know if the claims you’re sending them are the problem. If we’ve gone through all the above troubleshooting steps and still haven’t resolved it, I will then get a copy of the SAML token, download it as an .xml file and send it to the application owner and tell them:
This is the SAML token I am sending you and your application will not accept it. Tell me what needs to be changed to make this work – claims, claims types, claim formats?
One again, open up fiddler and capture a trace that contains the SAML token you’re trying to send them:
Copy the entire SAMLResponse value and paste into SSOCircle decoder and select POST this time since the client was performing a form POST:
And then click XML view and you’ll get the XML-based SAML token you were sending the application:
Save the file from your browser and send this to the application owner and have them tell you what else is needed. It is their application and they should be responsible for telling you what claims, types, and formats they require. A lot of the time, they don’t know the answer to this question so press on them harder.
Environmental or User-specific Issues?
Be sure to check the following:
- Make sure the service principal name (SPN) is only on the ADFS service account or gMSA:
Setspn –L <service Account Name or gMSA name>
Example Service Account: Setspn –L SVC_ADFS
- Make sure there are no duplicate service principal names (SPN) within the AD forest. If you find duplicates, read my blog from 3 years ago: http://blogs.technet.com/b/askpfeplat/archive/2012/03/29/the-411-on-the-kdc-11-events.aspx
Setspn –x –f
- Make sure their browser support integrated Windows authentication and if so, make sure the ADFS URL is in their intranet zone in Internet Explorer.
- Make sure the DNS record for ADFS is a Host (A) record and not a CNAME record. CNAME records are known to break integrated Windows authentication.
- Don’t make your ADFS service name match the computer name of any servers in your forest. It will create a duplicate SPN issue and no one will be able to perform integrated Windows Authentication against the ADFS servers.
If the users are external, you should check the event log on the ADFS Proxy or WAP they are using, which bring up a really good point. If you have an ADFS WAP farm with load balancer, how will you know which server they’re using? It’s for this reason, we recommend you modify the sign-on page of every ADFS WAP/Proxy server so the server name is at the bottom of the sign-in page. Then you can ask the user which server they’re on and you’ll know which event log to check out.
How is the user authenticating to the application?
Check the following things:
- If using PhoneFactor, make sure their user account in AD has a phone number populated.
- If using smartcard, do your smartcards require a middleware like ActivIdentity that could be causing an issue?
- If using username and password and if you’re on ADFS 2012 R2, have they hit the soft lockout feature, where their account is locked out at the WAP/Proxy but not in the internal AD? Here is another Technet blog that talks about this feature:
- Or perhaps their account is just locked out in AD.
David “Troublemaker” Gregory