Unexpected logout with SharePoint 2013 and ADFS
The last couple of weeks I was creating and configuring 3 SharePoint 2013 farm (Test, acceptance and Production) on Windows Azure. We did the provisioning, installation and configuration with PowerShell. This way we had the basics within 1 week running. Because we are on Windows Azure and the company did not want their AD extended to the Windows Azure environment we configured SharePoint with ADFS 3.0. This way we could give the end user a single sign on experience that they are used to and make it directly possible to let them connect from outside the company network. This was all tested with a Proof of concept a few months back and was working perfect. This Proof of Concept environment was a simple setup with 1 front-end, 1 back-end and a SQL server.
The Production environment now contains multiple front-end, back-end and SQL servers like the image below.
The environment runs on Server 2012 and SharePoint 2013 with the December 2013 CU. The SQL environment runs on Server 2012R2 and SQL Server 2012.
We can reach the front-end machines with a Load-balanced Endpoint which works with Round-Robin. After the installation we created a web application with a few site-collections and started testing. The web application has 2 authentication providers configured.
- Windows authentication
- Trusted Identity provider (ADFS)
The default page you get with 2 authentication providers can be confusing for the user because they expect to be logging in with Windows Authentication and that won’t work because the Windows Authentication is only used for search and service accounts. We have searched the web and found a nice solution on codeplex. This solution redirects the users to the correct login page depending on the location of the user. The only thing here was that there is a bug in this solution. It works fine in the browser, but when you open an office client we got an authentication prompt. The fix here was simple but we searched for a couple hours what the issue was. I filed this with this codeplex solution. Now the user has a single sign on experience when they are within the corporate network.
After this problem was fixed we installed a few solutions and got the end users to test the new environment. At this point we got some complains about forms and file upload where failing. We started investigating the problem and saw that the FedAuth cookie was set empty when the error occurred. The error was completely random and could also happen when the user was browsing in SharePoint. When we found out that the cookie was set empty we searched the internet and the ULS logs. We found in the ULS logs a few errors about the distributed Logon Token Cache. We found a blog post with the same errors (http://www.habaneroconsulting.com/insights/sharepoint-2013-distributed-cache-bug) we found in the ULS log. After reading this blog, we implemented the changes they did to fix their problem and we did not get the errors in the ULS. So we were happy. During the investigation we took out 2 of the 3 servers from the Windows Azure endpoints so we had only 1 ULS log to go thru. The solution also contains the installation of AppFabric CU4. With this in place we started testing and everything was working fine with 1 front-end. So we configured the other 2 server in the load balancer. After a few minutes we had the same problem again so we searched the ULS logs again and found messages that the entry was missing for the user in the Token Cache. See below the errors we got.
-
02/21/2014 21:22:44.74 w3wp.exe (:0x41F8) 0x3F98 SharePoint Foundation Claims Authentication fr16 Verbose Token Cache: Entry missing for user ‘’. 1188769c-d525-50d6-8cb6-4ed78c5f151a
· 02/21/2014 21:22:44.74 w3wp.exe (:0x41F8) 0x3F98 SharePoint Foundation Claims Authentication af30i Verbose Token Cache: Failed to find token for user ’’ for cookie so signing out the user. 1188769c-d525-50d6-8cb6-4ed78c5f151a
· 02/21/2014 21:22:44.75 w3wp.exe (:0x41F8) 0x3F98 SharePoint Foundation Claims Authentication af3xq Verbose SPChunkedCookieHandler: Removed ‘1’ cookie(s) with name ‘FedAuth’ for request ‘’. 1188769c-d525-50d6-8cb6-4ed78c5f151a
· 02/21/2014 21:22:44.78 w3wp.exe (:0x41F8) 0x3F98 SharePoint Foundation Claims Authentication ajrjo Verbose SPUtility.IsInitialMDSStartPageRequest: Returning False for input [SPAlternateUrl:’’] 1188769c-d525-50d6-8cb6-4ed78c5f151a
· 02/21/2014 21:22:44.78 w3wp.exe (:0x41F8) 0x3F98 SharePoint Foundation Claims Authentication ajrjg Verbose SPUtility.HandleAccessDenied: This is not an MDS start page, continue with redirect to authenticate.aspx. [AlternateUrl:] [RequestPath:] 1188769c-d525-50d6-8cb6-4ed78c5f151a
· 02/21/2014 21:22:44.85 w3wp.exe (:0x41F8) 0x1A10 SharePoint Foundation Claims Authentication aip7c Verbose SPFederationAuthenticationModule.OnEndRequest: User was being redirected to authenticate. 1188769c-453d-50d6-8cb6-476dc4b70b54
We could not find a solution so we created a case with Microsoft to investigate the issue. We had some suspects:
- Maximum number of cookies in the browser
- Minimal Download Strategy (MDS)
- Distributed Cache
Maximum number of cookies
This was one of the suspects because we found with fiddler that the cookie was set empty. A browser has a minimal amount of cookies that he should keep. The minimum is set to 50 but the browser implementation may be higher. You can check this with the following page http://krijnhoetmer.nl/stuff/javascript/maximum-cookies. We found out that we only had 15 cookies so this could not be our problem.
Minimal Download Strategy
We did not investigate this because we were focusing on the Distributed Cache, but I found a blog explaining the Minimal Download Strategy with customizations. And a blog that explains the MDS.
Distributed Cache
We already did some changes to the default settings from the blog of habanero. But with the help of Microsoft we took a better look at the settings of Distributed Cache. We found that the amount of memory that was allocated to the distributed cache was not set correctly. The amount should be the amount of internal memory - 2 GB divided by 2. See this TechNet Article. After changing the amount of memory we still had issues. We changed the network settings on the server to disable TCP Chimney Offload, this can be done with the following command:
Netsh int tcp set global chimney=disabled
This was only a part of the solution, but we still got the error when we have more than 1 server in the load balancer, we created a network trace from the workstation to the server and send this to Microsoft. They came back with a set of changes to the distributed cache that solved our problem at last. We did the following changes:
$settings = Get-SPDistributedCacheClientsetting -ContainerType DistributedAccessCache $settings.MaxConnectionsToserver = 4 $settings.ReceiveTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedAccessCache -DistributedCacheClientsettings $settings
$settings = Get-SPDistributedCacheClientsetting —ContainerType DistributedActivityFeedCache $settings.MaxConnectionsToserver = 4 $settings.ReceiveTimeout = “120000” $settings.MaxBufferSize = “16777216” $settings.ChannelInitializationTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedActivityFeedCache -DistributedCacheClientsettings $settings
$settings = Get-SPDistributedCacheClientsetting -ContainerType DistributedActivityFeedLMTCache $settings.MaxConnectionsToserver = 4 $settings.ReceiveTimeout = “120000” $settings.MaxBufferSize = “16777216” $settings.ChannelInitializationTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedActivityFeedLMTCache -DistributedCacheClientsettings $settings
$settings = Get-SPDistributedCacheClientsetting -ContainerType DistributedBouncerCache $settings.MaxConnectionsToServer = 4 $settings.ReceiveTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedBouncerCache -DistributedCacheClientsettings $settings
$settings = Get-SPDistributedCacheClientsetting -ContainerType DistributedDefaultCache $settings.MaxConnectionsToserver = 4 $settings.ReceiveTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedDefaultCache -DistributedCacheClientsettings $settings
$settings = Get-SPDistributedCacheClientsetting -ContainerType DistributedsearchCache $settings.MaxConnectionsToserver = 4 $settings.ReceiveTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedsearchCache -DistributedCacheClientsettings $settings
$settings = Get-SPDistributedCacheClientsetting -ContainerType DistributedsecurityTrimmingCache $settings.MaxConnectionsToserver = 4 $settings.ReceiveTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedsecurityTrimmingCache -DistributedCacheClientsettings $settings
$settings = Get-SPDistributedCacheClientsetting -ContainerType DistributedserverToAppserverAccessTokenCache $settings.MaxConnectionsToserver = 4 $settings.ReceiveTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedserverToAppserverAccessTokenCache -DistributedCacheClientsettings $settings
$settings = Get-SPDistributedCacheClientsetting -ContainerType DistributedViewstateCache $settings.MaxConnectionsToserver = 4 $settings.ReceiveTimeout = “120000” $settings.MaxBufferSize = “16777216” $settings.ChannelInitializationTimeout = “120000” Set-SPDistributedCacheClientsetting -ContainerType DistributedViewstateCache -DistributedCacheClientsettings $settings
So for every container in the distributed cache we needed to change some settings. After an IIS reset and all servers in the load balancer we only got the expected sign in after 50 minutes.