Beyond Science Fiction: Organic Architecture and Cellular Design

I know I already diverged from my original vein of the BASE principal with cypto currency. Well, I have another divergence. I was thinking over all the work that goes into the health checks of modern architecture, and after talking it over with our other consultants, I think we have an idea that can change how we handle application architecture and design. I would like to introduce Organic Architecture and Cellular Design.

I’m really hoping it takes off. I want to see my name in slide decks around the world. So pardon me if I try to write for the future quote.

Now, what do I mean by Organic Architecture and Cellular Design? Well, a common design practice in robotics is to mimic nature. Similar design principals exist in other disciplines. I think it’s time we bring that into software. In general, Organic Architecture would be designing architectures that respond organically to faults, load, and management. What does that look like? Well, let me set aside Cellular Design for a moment, and focus on Organic Architecture by analyzing a popular product.

Let’s take Cloud Foundry as are example. It’s a good product. It’s done well for Netflix, and Netflix has done some amazing work. T-Mobile is growing rapidly, and they’re running on Cloud Foundry. The Spring One Platform Conference was full of success stories. I have more examples from teaching the development and administration certification classes. I like Cloud Foundry. With the addition of Kubernetes, even better. Cloud Foundry alongside Openshift, which I’ll mention as another example, is revolutionizing application design, development practices, and team organization. Both Cloud Foundry and Openshift has systems devoted to monitoring containers’ health: confirm they’re running, responsive, and order a shutdown and deployment to fix any faults. This is actually a good design. It fits into our concept of Organic Architecture. It’s like an immune system, and the natural dead cell removal process in our own bodies. It identifies damage and removes it. In the process, it signals a regenerative process to begin repair. However, I think this design is also inherently faulty. Precisely because it deviates from organic mimicry through the constant health checks.

Let’s look at the health monitoring systems. Every, with little exception, health monitoring system involves running a process on the monitored unit to send regular announcements to prove it’s alive. It’s kind of like terminating the life of an employee who fails to clock in that morning. An odd practice, to say the least. It leads to a lot of orchestration overhead, which I think will lead to an eventual limit in the architecture. At a certain point, you’ll be spending so much time announcing and confirming health, you’ll need to devote additional resources to support that load. Which will generate more load. Before long, we’re in the same trap we found ourselves in with classic replication strategies. Replicating sessions across a whole cluster of application servers reached a limit where the replication work overcame the actual work. Database clusters have this problem. We solved this to some degree with more complex replication strategies. This is why we develop stateless applications.  Use of state by externalizing it on systems optimized to manage state has become the common practice. It’s what Cloud Foundry and Openshift are facilitating. Why are we collecting all those health metrics?

Let’s talk about mechanical maintenance on an industrial scale. We have this in datacenters with mechanical disks. Same with SSDs. Both have a mean time to failure. Every mechanical part has this metric. It’s why we change the oil on our cars. The process of supplying you and everyone else in your city with drive belts is the same industrial concern, at a high level, as maintaining storage drives. Namely, it’s foolish to wait for a specific failure to deploy a replacement. In fact, as that maintenance concern grows in urgency, like in a datacenter or aboard ships and aircraft, it is unwise to wait for a failure. You just replace the part when it roughly reaches that mean time of failure. It’s why we have military surplus. By military standards, the item has reached the end of its life. Meaning, it can’t handle the sort of hard use that it once did and the danger of a failure is so great that it isn’t worth the risk. Thus, they sell it to less critical scenarios. Like civilians who don’t need that sort of guarantee from their items. We even do it to humans with mandatory retirement. We can’t risk a failure of a worker in the middle of project. When a worker reaches that mean time to failure, we retire them. So, as application architectures grow, why are we waiting until we notice an application fails?

With both those points, why does it need to repeatedly report that it’s alive, and why is there no retirement plan? I answer with Organic architecture. What if, like cells in a body, the containers just reached a natural end of life and shut themselves down? Sounds stupid, right? That’s because it is. That’s not cellular at all: cells reproduce and spawn new cells. So let’s truly mimic the organic model. What if containers could deploy replacements for themselves, and shut themselves down when they’re too old? Now, old is a broad list of different metrics: actual time to live, number of operations, or memory utilization. It can be customized. The operative point is that a container knows when it should spawn a replacement, and when it should shut itself down. Now, this is an iterative design. We can make several small steps to get to this end. In next week’s article, I want to discuss how this Cellular Design benefits us, and then run through the technical implementation.

About the Author

>