When building an AlwaysOn Availability Group that’s hosted on VMs in Azure you may have issues querying through the Availability Group name/IP address from on-prem. You won’t see any issues when running queries from inside Azure, but you’ll see issues when running queries from machines on-prem, when connecting to Azure via a site to site VPN connection. The error that you’ll see is going to be similar to:
Msg 121, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 – The semaphore timeout period has expired.)
Thankfully the fix is actually pretty simple. We just need to drop the MTU on the AG members from 1500 to 1350. Why we have to do this, I have no idea, but it works. We do this by logging onto the console of the VM and use the netsh command to change the MTU. You’ll want to schedule a job to run on the VM at startup as every time the VM restarts and moves the VM to another host the MTU may change back.
I used PowerShell to change the MTU on startup.
$AdapterName = $(Get-NetAdapter | Where { $_.Name -Match ‘Ethernet’}).Name
netsh interface ipv4 set subinterface “$AdapterName” mtu=1350 store=persistent
I choose PowerShell because the network adapter name can change (especially in Classic VMs) so we need to grab the correct name on startup.
Run the PowerShell manually, then schedule it to run at startup and the semaphore timeouts will go away. I am working with SQL Server and Azure engineering to figure out why this happens when using an AG (with an Internal Load Balancer of course) and a site to site VPN (policy based in this case) so that we can fix this and make it not happen anymore. I’ll report back when I hear back about a permanent resolution.
Denny
The post Accessing an Azure hosted Availability Group from On-Prem appeared first on SQL Server with Mr. Denny.
One Response
We also see problems for the same reason without the error. The symptom we have seen is that the SQL query will take a very long time, but it will not time out. Once the MTU change was made the queries finished in seconds vs over 10 minutes in some cases.