What even is OpenStack?

A few days ago, I needed to learn what OpenStack is. Here’s what I’ve learned so far.

OpenStack is a collection of open-source “cloud infrastructure” services. One way to think of it is as an alternative to Amazon Web Services for those who want to operate their own data center instead of paying Amazon.1 Several of the OpenStack services are roughly analogous to Amazon services. For instance, OpenStack Nova is an alternative to Amazon EC2; OpenStack Swift is an alternative to Amazon S3; and OpenStack Cinder is an alternative to Amazon EBS. The OpenStack people publish a long list of organizations that use OpenStack in various ways.

As you might expect, OpenStack doesn’t have anywhere near all the features that AWS does, but the basics are there. The essential parts of an OpenStack installation are what they call the “Core Services”: Nova (which manages compute instances), Swift (a replicated object storage service), Keystone (authentication services), Neutron (for defining networks), Cinder (persistent block storage), and Glance (virtual machine disk image storage). Of those, Nova, the compute service, is the one the OpenStack people say is the most mature; a friend described Nova to me as “the beating heart” of an OpenStack installation. (From my perspective, it’s impressive how transparent the OpenStack developers are with the “maturity” ratings they give to various components of the project. Nova, for instance, gets a score of 7 out of 8, while Cinder scores only 5 out of 8. I have no idea how these scores are calculated, though.)

OpenStack services include both what are known as “platform as a service” services (like databases) and “infrastructure as a service” services (like load balancing). I’m told that this differentiates it from, for example, Cloud Foundry, which provides solely “platform as a service” services. Indeed, it’s possible to run Cloud Foundry on top of OpenStack, although I’m not sure how common this is.

OpenStack development is supported by a loose federation of industry sponsors. You can see which companies contribute the most. It seems to me that while Amazon services “only” have to run on Amazon’s data centers, OpenStack services must be able to run on a wide variety of hardware and OS platforms, in a wide variety of data center settings. So it makes sense that hardware and OS makers would contribute to OpenStack, because they want it to work as well as possible when it’s running on their stuff.

Finally, various third-party applications can be run on top of an OpenStack installation. To pick one example, Hadoop can run on top of OpenStack: one of the “Optional Services” OpenStack offers is something called Sahara, which is the OpenStack counterpart to Amazon EMR, which is Amazon’s Hadoop service. Here’s a talk about running Hadoop on OpenStack; it mentions Savanna, which is what Sahara used to be called. And here’s a blog post describing Hadoop as “the perfect app for OpenStack”; it has a diagram explaining how Hadoop, Sahara (née Savanna), and core OpenStack fit together.

Thanks to James Porter and David Karapetyan, who helped me understand what OpenStack even is.

  1. It’s also possible to rent out space on an OpenStack installation that someone else runs. The practical advantage of doing this rather than using Amazon might be that you have the freedom to move to a different OpenStack installation, should you ever want to.