During the first phase of architecting a software you try to break up the "application blob" in OS processes. They are the corner stones of any application. You can view them as large software components: usually they are binary, they are defined by explicit or visual contracts, they are units of deployment and maybe even re-use.
Because identifying the OS processes in your application is so important, I might have emphazised the notion of a software cell being an OS process a bit too much. You might have gotten the impression, software cells in the end are nothing but OS processes. But that´s not what I mean. Really. It´s just that software cells are most often used to depict an OS process.
So let me try a more formal definition of what I think a software cell is:
A software cell is a unit of code - logically or physicall - which operates asynchronously with regard to other units of code.
This clearly is true for an OS process. It runs on its own thread or even its own machine. And so it is also true for an aggregation of processes, e.g. a whole application or a service. They run on one or more threads dedicated to them.
However, I see a process as a physically separate unit of code, whereas a whole application is more like a logical unit of code, since it´s not one "thing" you can put your finger on. Instead it´s distributed across several physical "things" (OS processes, machines etc.). But you can draw a circle around several processes thereby aggregating several units of code into a higher level unit of code and call them an application.
But what about smaller units of code, smaller than an OS process? They can be software cells, too. A single OS process software cell can contain/host/consist of several in-process software cells. I call them active components or worker components or async components. (More of components in future postings.)
The software cell hierarchy thus extends from the very large to the pretty small, from cross-process to in-process.
Now, although a service is different from an active component which in turn is not the same as an OS process, they all share one characteristic: the run in parallel with other software cells on the same level. That´s what I find important to express across all levels of abstraction.
Why is that important? Because communication between parallel units of code is so different from synchronous units of code. Synchronous sequential processing within a software cell uses the stack and therefore is secure, fast, reliable etc. Asynchronous units of code on the other hand cannot communicate via a stack. They need some other means which always is an indirection of some kind. That makes async communication much slower than sync communication (up 3 orders of magnitude and more). And often it also makes communication less reliable and less secure etc. With async communication the "Fallacies of Distributed Computing" kick in. It´s because of them that I find it so important to be so conscious about where async units of code are in an architecture. That´s why I devote a whole concept with its own visual notation to this. Software cells are supposed to protect your thinking from slipping into one or more of the fallacies.
Software cells run on their own threads. Software cells thus need to communicate with their environment through some medium. This can be an in-proc queue, or some other form of shared memory, or a database, or a TCP connection, or the Windows message pump. You name it.
Whenever you see a relationship between two software cells you immediately know it´s a simplyfication. In reality any communication between the software cells is not direct, but mediated. So medium or even infrastructure necessarily sits between them. Like a stack between two synchronous methods, but much slower.
This has some implications. Whenever you draw a software cell you need to ask yourself a couple of questions. Here are some that come to my mind right away:
- How is the software cell started and stopped?
- Can the software cell be paused or interrupted?
- How are processing failures reported to the environment of the software cell?
- How should data be passed into and out of the software cell?
- How is shared data protected from inconsistencies through concurrent access by several software cells?
- How can I be informed of certain states during a process executed by a software cell?
- How can I know if a software cell is still alive and progressing?
- How fast, reliable, secure etc. is a software cell and a connection to it?
- Does it matter to a client of a software cell, where and if the software cell is running?
Some of these questions relate directly to the above fallacies. But there are more that need to be answered. Especially you need to ponder how the communication between a software cell and its environment should take place.
For in-process software cells (active components) you might just use a Queue<T> - with some synchronization code added. For two OS process software cells, though, you´d need to switch to TCP sockets or MSMQ or WCF. Or you use the file system as a medium: one software cell could create and write a file, the other could react on that and read the file.
And there´s one more twist to active components: they are running inside an OS process and so you should decide if they should be hosted inside a special AppDomain. It´s like deciding whether two OS process software cells should run on the same machine or not.
As you can see, co-locating or not is an issue you have to decide on, too, at every level of abstraction in your architecture.
Think of software cells as islands with their own population. The people living on an island happly work on some tasks - at their own speed.
You can communicate with them, e.g. by sending a message in a bottle or calling them via sattelite or dropping something from a plane. The islands can be located close to each other in the same sea or exist far apart in different oceans. Most importantly they all work pretty autonomously on their own threads.
What´s the benefit of all this?
- Thinking in terms of async units of code on several levels of abstraction makes you very conscious with regard to the costs and means of distribution.
- As long as you model your applications with software cells you keep it flexible with regard to the topology. Software cells can be moved around between processes and machines etc. comparatively easily. This is due to the explicit medium always used to communicate between them. It naturally decouples them.
- Since software cells are basically distributable units of code (remember: always indirect communication through some medium), you get a head start in terms of scalability. A software cell can be "multiplied" pretty easily to compensate higher load. Asynchronous processing is main road to scalability, not faster processors. Software cells make scale-out easier.
