Press enter to see results or esc to cancel.

Hot Swappin’

Deploy & swap your Sitecore Azure Web App without downtime

This blog post is the product of my cooperation with Mark de Bruijne, senior lead developer at Colours. Mark extended my Azure ARM scripts with the optimizations shown below, based on his research to avoid downtime on swapping, and helped me tremendously with writing this article.

In our Continuous Delivery approach, we are happily using the Blue Green deployment strategy for Sitecore on Azure. We use Visual Studio Team Services (VSTS) to fully automate the deployment process. So once we trigger a deployment to the production environment, the following steps are being executed sequentially:

  1. Provision a fresh (emptied) staging slot to our Web App, filled with only the (vanilla) files Sitecore delivers in their Sitecore Web Deploy packages (.scwdp).
    Note: We stripped out the database logic to avoid impact on the existing database content.
  2. We fill up the staging slot with the following artifacts from our build process:
    • Web application: the built output of our Visual Studio solution.
    • Frontend: the optimized front-end assets.
    • Configs: the web and Sitecore configuration files, optionally transformed specifically for the applicable role instance.
    • Content Management role only: deploy Unicorn .yml files and execute a synchronization.

And if all steps of this deployment were successful:

  1. Directly swap the staging slot so it becomes the production slot.

Cold starts after swapping slots

Recent experience we had while swapping the staging slot towards the live environment, is that after the swap finished, the visitors were hitting a web application that was still starting up. Such a cold start can take up 2 to 4 minutes depending on your scale and of course the complexity of the web application you’ve built.

Although some say such a delay in being served a page isn’t downtime from a technical point of view, this is definitely what we wanted to avoid: our valuable customers and visitor deserve zero downtime, also during deployments. Additionally, zero downtime deployments enable increasing your deployment interval.

Warm slot swapping checklist

Let’s start with the relevant settings we gathered while mitigating the cold starts after a deployment. We want to share these learnings with you, so we’ve created a checklist you can use in finding what is blocking you from having warm swaps.

Context: a sample setup

To have a better and common understanding of our starting point, here are some details about the setup we’ve used for this example:

  • Sitecore version: we have used Sitecore 8.2, but presumably there’s no difference regarding the different versions as long as you’re using a Web App compatible version (version 8.2 initial release and up)
  • Slots: for our App Service (Web Apps) we have configured two deployment slots:
    • Staging slot
    • Production slot
  • Host names: both slots have their own Azure host name.
    The production slot also has a custom domain configured, on top of the default host name:

    • Staging slot:
      • https://my-customer-cd-staging.azurewebsites.net
    • Production slot:
      • https://my-customer-cd.azurewebsites.net
      • https://www.mycustomer.net
  • Sitecore site definitions: we configure the hostname=”” attribute in the Sitecore configuration of the particular slot. So the host name of the staging slot is not present in hostName=”” attribute for the site definition in the production slot.
  • Application settings: you can configure an Application Setting to be “Sticky” or a “Slot setting”. In this sample setup, such settings are being used.
  • IP Filter: this example uses IP filtering (via Azure Portal UI or config) as often used for non-production environments. You want to have warm swaps for those environments as well.

Enough introduction, let’s swap to solutions!

1 | Do I really need those sticky slot settings?

As Bas Lijten already shared in his blog post about zero downtime deployments for Sitecore on Azure, slot specific settings will give you cold starts after swapping. The reason for it is that after the swap the runtime configuration of the swapped slot is getting updated with the sticky settings (of the production slot you swapped into). And the result is the same if you would adjust your web.config (triggering a restart of your web application).

So if you can avoid having such sticky slot settings, you might be able to avoid cold starts easily. And of course it would be great if you can have both the specific settings and warm swaps, rights? Let’s continue…

2 | Is the host name of the staging slot (also) configured in the production slot?

Although we are not totally sure about this (why this would be a problem), our experience was that the host name you use for the first request that triggers the warm-up of your web application, does matter from a Sitecore perspective.

So let’s say you have a cold web application and request it initially with the host name https://my-customer-cd.azurewebsites.net. As a result the Sitecore web application warms and starts up. So far so good. But when you request it based on an alternative host name that is also configured, like https://www.mycustomer.net, it seemed that the Sitecore web application suffered from an additional start, causing waiting time again.

In extension to that, our Sitecore site definition in our staging slot was only configured with the host name https://my-customer-cd-staging.azurewebsites.net. So in case it was warmed up during the swap, probably that host name was used. Or in case no specific host name was used for warm-up during swapping, the default/fallback Sitecore site would be warmed. And after the swaps completed, visitors were hitting the web application by using https://www.mycustomer.net anyhow, which differs from both the host names and Sitecore site definitions that were applicable to the staging slot. So once swapped it could very well be that the first visitors face a cold start of your web application.

So check whether both your custom domain (if applicable), the Azure host name of the production slot and the Azure host name of the staging slot are all configured pipeline-separated in the hostName=”” attribute of your Sitecore site definition. This could help avoid cold starts. If this however isn’t a solution for your specific setup, continue with reading.

Note: from a SEO perspective, in case of multiple configured host names, always also specify the targetHostName=”” attribute with the host name you want your links to be populated with.

3 | Do I have configured IP restrictions?

Having an IP filter is a good and easy to configure way to avoid non-production environments for being accessed by search engine crawlers, or to simply restrict access from the internet. Also it can be used in a setup in which your Sitecore Content Delivery Web App is being served via a reversed proxy.

But did you know that forgetting to add the local loopback IPv4 address 127.0.0.1 to your whitelist can result in cold starts after a swap? We didn’t either until we did a deep dive in the logging of the warm-up process during a swap. Just adding this IPv4 address can be a solution for you:

If this however isn’t a solution for your specific setup, continue with reading.

Solution: custom warm-up during swapping!

Luckily there is a solution for all of the above: the Application Initialization Module.

Quoted from this page of Azure documentation:

Some apps may require custom warm-up actions. The applicationInitialization configuration element in web.config allows you to specify custom initialization actions to be performed before a request is received. The swap operation will wait for this custom warm-up to complete.

And our Sitecore web application is such a “some app” ;). With the usage of the applicationInitialization element in our web.config we are able to:

  • Use slot specific settings without having cold starts.
  • Force the web application to warm-up itself based on the host name we instruct him to.
  • Fully warm the to-be-swapped slot while having IP restrictions applied.

Eventually it is a matter of adding a simple configuration fragment to your web.config:

The “doAppInitAfterRestart” specifies that the initialization process is initiated automatically whenever an application restart occurs. This is needed, because slot specific settings trigger an additional restart during a swap.

Although the root of the web application (“/”) is getting requested during the internal warm-up, we did specify it as “initializationPage” element because we then explicitly can configure the host name it uses. Note that the host name is used internally as a request header, it will not be used externally (so not via internet by looking up the configured host name in the DNS). This to ensure the to-be-swapped web application is warmed up with the host name it will be hit with once it becomes the production slot.

If you want to, you can add multiple pages, such as important landing pages.

But also, you can slightly warm-up the Sitecore client sites on your Content Management Web App. As those are separate Sitecore site definitions, they warm-up separately. It saves your content editors waiting time after you did a deployment:

Having this custom warm-up configured, you will experience that the process of the swap itself will take some extra time. But that is a price you’ll be willing to pay as you save your visitors valuable waiting time. It is all about user experience, right?

After having implemented this custom warm-up we came back where we wanted to be: continuous delivery without any downtime!

One last thing: configuring this Application Initialization Module will also ensure you don’t have any downtime if you (automatically) scale out or in horizontally, while using slot specific settings. New worker processes are forced to use your configuration before they will serve traffic. But be warned, if you scale up (or down) vertically, this setting does NOT help to avoid downtime.

Troubleshooting / Under the hood

If you have configured the custom warm-up and still face cold starts after a swap, or if you want to take a look under the hood of this feature, you can use the Failed Request Tracing for you Azure Web App. Detailed logs can then be accessed by using Kudu. A good walk through about how to do this is written by RuslanY.

We hope this articled helped in understanding which settings influence startup times of your Sitecore Web App and how to prevent downtime on swapping slots. If you have any questions or additional findings, shoot us a message via Twitter (@rhabraken, @markuznl) or by commenting on this article below.

Comments

Comments are disabled for this post