When it comes to slow and broken digital user experiences, none of us has any patience. When someone can't access a website to get the information they need, they click the browser's back button and move on to the next link in their search results. Drupal has continually improved performance by adding a powerful cache management layer to Drupal 8. Meanwhile, any time database changes are deployed to a Drupal website, the recommended and default behavior is to display the below maintenance page across the entire website including the homepage.
For better or worse, displaying Drupal's maintenance page is a broken user experience.
There are technical reasons why Drupal's maintenance page exists - end-users don't care about the technical reasons behind a maintenance page. End-users come to a website to get information. To their minds, if they can't get this information immediately (from their perspective) the website is broken. Sure, the maintenance page can provide some more information and reasons why the website is unavailable. Still, a website's digital door is temporarily shut. The expectation is that the internet superhighway is available 24/7, yet in the Drupal community we are okay with a little downtime every week or so.
We need to examine this problem and come up with some better solutions.
Why does maintenance mode exist?
The best metaphor as to why Drupal needs to display a maintenance page when deploying code and database changes is…
You can't work on a car's engine while it is driving down the highway.
Drupal's maintenance mode has been around since Drupal 4.7. Khalid Baheyeldin (kbahey) contributed the first patch in Issue #32622: Site Offline Under Maintenance feature. The Drupal community came to an immediate consensus that stands today - being able to take a site offline was and still is a useful feature.
Drupal's maintenance page provides a way to stop visitors from accessing an undesirable user experience while allowing administrators and developers to access a site.
The existence of Drupal’s maintenance mode makes sense. Even preventing a user from accessing a broken website, while a website is being updated, makes sense. The problem is displaying a maintenance page across an entire website feels like a broken user experience to end-users. The possible solutions to this problem lay in not having or displaying a broken website.
Is maintenance mode always needed?
Drupal is complex architecture with a lot of moving parts therefore, certain parts of Drupal are going to need to be updated in isolation. We can't assume that a deployment, which is working in staging/testing environment, is going to work the same in a production/live environment. Production environments are under considerable more load than any staging/testing environment. Production servers will have concurrent page requests and form submissions, which may fail as code and database records are updated and even when Drupal's cache is cleared.
I explorated with the maintainers of Drush, the possibility of allowing a developer to decide when a site should not be put into maintenance mode during database updates, and the answer is deploying code and database changes without maintenance mode can result in data loss.
The risk of data loss means we must still use maintenance mode when deploying code and database changes.
Not having a broken website during code and database updates can be mitigated by reducing the downtime. Code changes generally only takes a minute or two, meanwhile database changes can require long batch processes to be executed, and can take several minutes.
I can speak from my experience building and maintaining the Webform module Drupal 8. I have written over 150 incremental updates to Webform related configuration and data schemes. There might be an opportunity to review and rethink how certain aspects of Drupal's Update API works. Maybe a module maintainer can explicitly state that a specific update hook requires a website to be in maintenance mode.
Stepping back from the technical challenges and issues around deploying code, what is needed when deploying changes to a production environment is...
A safe environment that won't have any data loss and provide the optimal user experience.
How to provide a safe environment for deploying updates?
When I initially stated that a site's homepage is replaced with Drupal's maintenance page when changes are deployed, I ignored the current poormans workaround, which is most enterprise sites are heavily cached by a reverse proxy. When changes are deployed, the reverse proxy can continue to serve cache pages. In many cases, a site appears to be up and running for most anonymous users as long as they are requesting cached pages, as soon as they click a form's submit button or login they will see a maintenance page. The approach of relying on cached pages during deployments is the recommended solution for most Drupal hosting providers.
Serving a cached site is essentially providing users with a "read-only" user experience, except certain actions will unexpectedly result in a maintenance page. This solution still provides a somewhat broken and unpredictable user experience. The notion that a read-only version of website can be served to end-users when deploying changes made me wonder…
Instead of displaying a maintenance page site-wide, could a site be switched to read-only mode during deployments?
What is a read-only mode?
"Provides an alternate to the built in Maintenance Mode in Drupal. Instead of displaying a static text file to users while the site is in maintenance mode, Read Only Mode will allow access (reading) of new content while preventing the addition of new content (posting / submitting forms / etc)."
In read-only mode forms are replaced with the below message.
Site is currently in maintenance. During this maintenance it is not possible to change site content (like comments, pages and users).
A read-only Drupal instance should...
- Allow users to access content.
- Disable content editing, comments, and webform submission.
- Display custom messages when a user can't create or update content.
An interesting nuance to read-only forms and comments is that if the application data is being stored remotely, a site may not have to disable forms and comments. For example, if a website uses Disqus for comments, then it would not need to disable comments. If a webform does not save results to the database and the webform either sends email notification or remote posts submission data to third party server; a site's webforms might also not need to be disabled.
How can a site safely be switched to read-only mode?
The read-only mode must prevent the creating, updating, and deleting any records stored in the database. Ideally, the entire database should set to read-only access. If you know how complicated a Drupal site can be, this is not realistic. The critical thing is for the read-only mode to prevent users from writing data that might be lost.
Another challenge is this: if a site is set to read-only while code and database updates are being executed, it is still possible to run into cache clear, locking, and performance issues. The best solution is to create an isolated read-only instance that is independent of the production instance while code is being deployed.
Enterprise Drupal websites use load-balanced servers to increase reliability and availability through redundancy. Most Drupal hosting providers have some form of redundancy where if a server stops responding the website will failover to another server. Load balancers support calling a health-check script, which monitor's the server's database and filesystem. The health check script is called every few seconds. If a server's database or filesystem becomes unavailable, the load balancer will direct user traffic to a stable server.
We can apply a similar approach to create load-balanced environment in which a site is switched to maintenance mode and user traffic is directed to an isolated, read-only instance of a website.
Here is a step-by-step look at leveraging a read-only server during a Drupal code and database deployment.
- A dedicated read-only copy of a production Drupal site is set up. The "read-only" site needs to be configured to always enable the Read only mode module.
- The read-only Drupal site is synced with production nightly or manually. The read-only site must never be synced during a deployment.
- A health check script is set up on the production site which returns FALSE when a site is switched to maintenance mode.
- When the production site is switched to maintenance mode, the load balancer, using the healthcheck, should direct all traffic to the read-only site.
- Once the production site switches off maintenance mode, the load balancer should now direct all traffic back to the production site.
What are some of the downsides to a read-only server?
A load-balanced read-only server is not a perfect solution. End-users will still not be able to submit certain forms and comments. Adding another server to a load balanced hosting environment increases infrastructure costs. Fortunately, the read-only server is not used frequently and by being read-only, requires less computational resources.
What is missing and what is next for deploying an enterprise Drupal website with minimal downtime?
This blog post is just walking through a proof-of-concept of leveraging a read-only server during code and database deployments. Everything discussed needs more thought, work, and community contribution. I suspect some enterprise Drupal sites have come up with other and possibly better solutions to reducing downtime during deployments.
In a previous blog post, I talked about how companies within open source should work together to solve shared and challenging problems to benefit the broader community. Improving deployments is a challenging problem that is impacting everyone in the Drupal community. Reducing downtime during deployments helps everyone, especially the customers of Drupal hosting providers. I am sure this question is asked to every hosting provider, and I am optimistic that hosting providers can contribute their ideas, solutions, and resources to solving this problem.
The next step is for an organization or hosting provider to implement a full proof-of-concept and document what it takes to minimize downtime during deployments using a read-only server.