For a few weeks now Failover Clustered Instances in the Microsoft Azure cloud have been possible by using SIOS DataKeeper Cluster Edition to cluster the VMs together and get yourself shared storage. This has actually been possible for a while, you just needed to know how to do it. Now it’s a fully Azure Certified configuration and VMs with SIOS DataKeeper preinstalled are even available from the Azure Marketplace.
Now when setting up clustering in Azure you need to be sure to follow the various scripts which are out there so that you can setup what’s called the Internal Load Balancer (the ILB) within Azure. The scripts which I like the most are by Dave Bermingham’s and can be found on his blog.
Now when you get down to the “Create an Internal Load Balancer” pay special attention to some of the settings in the Get-AzureVM lines as some of these values are going to determine how quickly the ILB sees that the SQL Instance has moved from one VM to another. Under the default settings shown in Dave’s blog post (don’t blame him, these are the same scripts that you’ll find on MSDB I just like how Dave presents them better) you’ll see that the ProbeIntervalInSeconds parameter is set for 10 seconds which means that the ILB will only check which VM the clustered IP address is running on every 10 seconds. Now by default the ILB must fail twice before it will move the connections to the new VM. This means that the cluster will be down for an additional ~20 seconds between when SQL comes up on the new cluster node and when connections to it will successfully connect.
You can adjust this value to reduce this time by reducing the ProbeIntervalInSeconds parameter to a lower number. The lowest supported value is 5 seconds which would reduce the outage from ~20 seconds to about ~10 seconds or so. Which is definitely something which I would recommend as we want to keep the downtime window as short as possible as the whole reason for Clustered SQL Instances is the most availability possible.
Denny