Developing in the Clouds

I’ve seen a lot of interesting evolutionary changes in our development practices over the last 10 years or so and I actually don’t think there have been many revolutionary changes.  Until now, that is.  The power of cloud technologies has introduced a new twist in what development teams can utilize – infinite resources.  Yeah, yeah, I know there really is a capacity limit and I know someone still pays the bill at the end of the day but it’s still fundamentally different than what we are used to.  Without even realizing it, we have built development processes around resource constraints – everything from how developers code, to our QE, to our release processes.  But that doesn’t have to be the case anymore and it doesn’t have to break the bank either.  On OpenShift, we decided early on to build our development process entirely around the concept of using as many volatile resources as we needed to see how productive we could be.  I’ll let you judge for yourself, but to me, the results have been astounding.  Not only is our team more productive than any development team I’ve worked with before, but we are shockingly more cost effective as well.

Let’s take a trip down memory lane first to give you some context to my past experience.  If you roll back the clock 5 years or so, scaling development was painful.  Development essentially ran off desktops and the operations controlled environments were rigid and static.  On the development side, I was constantly struggling to get powerful enough desktops and enough of them to keep the development team productive.  Virtualization wasn’t that prominent yet, so often the developers needed two or three desktops to emulate multi-tier systems.  I spent as much time in purchasing and unpacking boxes as I did coding (and on the warranty calls when the things broke down).  We would constantly have power issues and stability issues as developers would inevitably start hosting shared services on their desktops like our test or continuous integration servers.  Power outages every few days would destabilize services and every once in a while kill a hard drive that wasn’t backed up.  On the other hand, our operations controlled environments that had stable power and redundant storage were ridiculously expensive and effectively static.  Those environments were essentially mirrors of production and expected to be as stable as production.  Developers needed agility so they stuck to their desktops.  It’s the classic development and operations divide and I’ve seen this same scenario play out time and time again.

When we started OpenShift, we wanted to approach it differently.  Sure, everyone still gets a laptop (or in some cases a desktop) for a personalized development environment but that is where physical resources stop.  Developers have the ability to spin up as many mini-environments and sync their local code to those environments.  They can use those mini-environments to kick off various suites of tests as well as do ad-hoc testing of their own.  They might only need a single environment if they are working on a single feature or they might need a dozen.  The focus for us was on ease of consumption – a single command and a 30 second wait is all they need to spin up a new environment of the latest stable build.  Another command synchronizes their local changes to that environment.  The most important thing is that there are no limits to the number of environments they can wield – whatever makes them productive.

But let’s talk cost effectiveness.  How can we provide something like this and not completely break the bank?  Well, this is where IaaS pricing can start benefiting the consumer.  Before you get there however, you have some challenges to solve.  The first challenge is the developer habit of keeping long running machines around.  Any time a developer wanted to test something, we wanted him to be able to spin up a new machine with his changes.  You have to make it easier to start with a new machine instead of keeping an old one around.  Our goal was to keep that entire operation to under a minute, providing easy access to various levels of the codebase and that did the trick for us.  Once you’ve broken the requirement of long running machines, you now start to take advantage of hourly pricing with most IaaS vendors.  If a developer can spin up a machine, run their tests and finish in an hour, you probably paid about $0.30 for that operation.  If they need 6 hours, you are paying less then $2.  To help reinforce this model, any machine that hasn’t been logged into in 6 hours in our environment automatically gets stopped.  The reality is that developers aren’t great at cleaning up, but once you get the model right, they don’t need resources for a long time.  If you get the habit formed around starting each time with a new VM, everything else starts to fall in place.

And yes, being Red Hat, we’ve also open sourced this work (https://github.com/openshift/origin-dev-tools) too.  While these tools are fairly specialized for OpenShift development, hopefully you might find a nugget or two in them that will help in your own process.  And we’re always up for suggestions so if you see a better way of doing something, please send us a pull request!

Next up, I’ll discuss how we expanded upon this to improve our code submission and review process.  As many readers will know, cutting and testing code is only one part of developer effectiveness.  We’ve been able to use these same techniques to fundamentally change how we look at code.  Ever wonder what that OpenShift GitHub Bot is all about commenting on pull requests (e.g. https://github.com/openshift/origin-server/pull/1238)?  If so then stay tuned!