People Break Stuff

People Break Stuff!

Now we are into the New Year, I’ve had some time to reflect on 2017 and our customers’ experiences, especially on Black Friday, Cyber Monday and through the holidays. There was a common trend that jumped out to us all as we reviewed where they had difficulty, and it was the fact that people break stuff! We used a different word, but “stuff” will do. I’m not talking about people making bad decisions like not applying security patches as they come out and then wondering why they are suddenly neck deep in an attack. No, I’m referring to intelligent people, with the best of intentions, but with a few impulse control issues – let me explain.

Their livelihood depends on their site functioning properly and they care a lot about the stability of their applications. Perhaps too much though, because if they see something that doesn’t look perfect, they feel the need to touch it and figure out why. This often results in an untracked production change that can spell disaster for their storefront. But I get it, when you care so much about the outcome, it takes tremendous discipline to not touch something, and to not want to fix it right that second.

I’m not suggesting that these otherwise intelligent people should somehow shutdown and become somebody else. Quite the contrary, I believe strongly that their need to make things better, and their urgency to get it done, is to be encouraged. I’m more concerned with the environment that allows them to make an untracked hot fix in production in the first place. I think the bigger problem is that they are missing some key tools to help them apply changes safely, without having to endure the usual slow change management process.

So the real problem is a bit more subtle than the obvious risk associated with making a hot fix in production. More typically, I see customers making production changes or deployments with little to no testing in a production like environment. Often, the only testing they do is on their local box or a dev server that is wildly different from production. This is a problem that I can do something about. Even though we are all delivering our services from the cloud, sometimes we miss the obvious advantage that that affords us.

Last year, my support team completed the transition to a software defined delivery model where we no longer hand build any infrastructure. This means that we can push button deploy the distributed system, that is their modern cloud application, by compiling the software description of their system into a running application on whatever cloud provider they choose. We later added the ability to use this same toolset to deploy past versions of the application from backup. This means a staging site is simply a redeployment of the most recent backup of the production system. See where I’m going?

My goal for 2018 is to put this control in the hands of our customers. I want my customers to leverage our software defined infrastructure in their deployments so that they can do dry runs on a copy of production, eliminating risk. Now, with an old hardware-based mindset, this sounds like a lot of work. But we’re in the cloud and it’s easy given the cloud-native software we have already developed.

Our next iteration of dry run deployments will be driven from a webhook in the Webscale portal. That means testing their deployment is as simple as clicking a menu item in our portal. With that click, we will deploy a copy of production, install their latest code release, and execute a set of automated tests aggregating the results and presenting them back for inspection.

This tool is the first of many coming from Webscale to help eliminate risky deployments and changes from our production environments, while still helping our customers realize their urgency to fix things, while maintaining 100% uptime.

Jay

Jay lives in the mountains outside of Boulder with his wife and three kids. He loves to cook and chase bears with his dog Ginger. When not chasing bears, Jay works with a talented group of friends building Webscale.