Every few years, a new technology comes along that revolutionizes the way we work in IT. Ten years ago it was virtualization, which paved the way for cloud-based services and computing. Now, it’s all about containers and the vibrant ecosystem they are creating. In this two-part series, we will show how we manage and run compute infrastructure at IoT scale on top of both Mesos and Docker.
This container revolution is in large part due to the rise of DevOps in general and the success of Docker in particular. It’s a perfect complement to the popular “microservice architecture”, which makes it possible to design software applications as independently deployable services. Here at Samsung, we have fully embraced this emerging trend.
SAMI is a complex platform, with many moving parts. As of this writing, we have more than 40 internal services (and growing!), along with some of the most popular back-end technologies, including: NoSQL datastores, message brokers, service registries, configuration stores, graphing databases, HDFS, big data processors, in-memory caches and traditional SQL databases. It’s a constantly evolving platform, where developers introduce new technologies and applications to the stack to address issues and challenges around big data processing for IoT needs. We, here at DevOps, are responsible for designing and managing infrastructure to support this workload, ensuring scalability, security and compliance, all while remaining agile!
About a month ago we moved our SAMI platform to run on Mesos and Docker. You can think of Mesos as a kernel to your entire datacenter, which abstracts away all the underlying hardware and VMs, allowing you to program against the datacenter as if it is one giant compute resource. Docker, meanwhile, is a containerization technology, which simplifies the way we package and ship applications.
As you may know, this is a true paradigm shift in the way we think about application packaging, deployment, orchestration and monitoring. This required a complete redesign of our automation pipeline, bringing in many new and exciting technologies and retiring some older/brittle tools.
The push toward container technology
Even before containers became the hottest topic in town, we had a decent, well-rounded automation pipeline, at the core of which was our Configuration Management (CM) system. From provisioning to compliance to application deployment, everything was automated through our CM tools (Chef and Saltstack).
However, as our platform began to grow, some of the tools’ shortcomings became apparent. In order to support and scale an increasingly complex system like SAMI, where new features are being introduced at a rapid pace, we realized we needed a new approach for the growing number of microservices that must be deployed and managed.
Here are some of the limitations that needed to be addressed (note: These are some of the common pitfalls of CM tools in general and not specific to our implementation):
- Node/Machine-specific perspective
- Declarative: Run ‘this’ on ‘that’ VM
- Static partitioning
- Multi-tenancy requires manual configuration
- Resource wastage
- No resource isolation
- Longer provisioning and deploy times
- No dependency/workflow management: Can’t enforce “deploy Service-B after Service-A, only after Service-A’s health check passes”
- No self-healing: When a machine dies, an operator has to manually replace the dead node
- Heterogeneous infrastructure is not easy: Almost all cookbooks/modules/playbooks never work across more than two distros
- Steep learning curve
To run a modern platform at IoT scale, these limitations were unacceptable for us.
Enter Mesos and Docker to help turn things around. In Part 2 of this series, we explain how it was done for SAMI.
Top image: Louis Vest
Body image: Blizzy78