The art of good design is that of choosing the "best" way to meet a set of requirements. The art of good concurrent
system design is often that of choosing the simplest way to satisfy the needs for concurrency. One of the first rules
for designers should be to avoid reinventing the wheel. Good design patterns and design idioms have been developed to
solve most problems. Given the complexity of concurrent systems it only makes sense to use well-proven solutions and to
strive for simplicity of design.
Concurrent tasks that take place entirely within a computer, are called threads of execution. Like all
concurrent tasks, threads of execution are an abstract concept since they occur in time. The best we can do to
physically capture a thread of execution is to represent its state at a particular instant in time.
The most direct way of representing concurrent tasks using computers is to dedicate a separate computer to each task.
However, this usually too expensive and is not always conducive to conflict resolution. It is common, therefore, to
support multiple tasks on the same physical processor through some form of multi-tasking. In this case, the
processor and its associated resources such as memory and busses are shared. (Unfortunately, this sharing of resources
may also lead to new conflicts that were not present in the original problem.)
The most common form of multi-tasking is to provide each task with a "virtual" processor. This virtual processor is
typically referred to as a process or task. Normally, each process has its own address space that is logically
distinct from the address space of other virtual processors. This protects processes on conflicting with each other
against accidental overwrites of each other's memory. Unfortunately, the overhead required to switch the physical
processor from one process to another is often prohibitive. It involves significant swapping of register sets within
the CPU (context switching) that even with modern high-speed processors may take hundreds of microseconds.
To reduce this overhead, many operating systems provide the ability to include multiple lightweight threads
within a single process. The threads within a process share the address space of that process. This reduces the
overhead involved in context switching, but increases the likelihood of memory conflicts.
For some high-throughput applications, even the overhead of lightweight thread switching may be unacceptably high. In
such situations it is common to have an even lighter-weight form of multi-tasking that is achieved by taking advantage
of some special features of the application.
The concurrency requirements of the system can have a dramatic impact upon the architecture of the system. The decision
to move functionality from a uni-process architecture to a multi-process architecture introduces significant changes to
the structure of the system, in many dimensions. Additional mechanisms (e.g. remote procedure calls) may need to be
introduced which may substantially change the architecture of the system.
System availability requirements must be considered, as well as the additional overhead of managing the additional
processes and threads.
As with most architectural decisions, changing the process architecture effectively trades one set of problems for
another:
Approach
|
Advantages
|
Disadvantages
|
Uni-process, no threads
|
-
Simplicity
-
Fast intra-process messaging
|
-
Hard to balance workload
-
Can't scale to multiple processors
|
Uni-process, multi-threaded
|
-
Fast intra-process messages
-
Multi-tasking without inter-process communication
-
Better multi-tasking without the overhead of 'heavyweight' processes
|
-
Application must be 'thread-safe'
-
Operating system must have efficient thread-management
-
Shared memory issues need to be considered
|
Multi-process
|
-
Scales well as processors are added
-
Relatively easy to distribute across nodes
|
-
Sensitive to process boundary: using inter-process communication too much hurts performance
-
Swapping and context switches are expensive
-
Harder to design
|
A typical evolutionary path is to start with a uni-process architecture, adding processes for groups of behaviors that
need to occur simultaneously. Within these broader groupings, consider additional needs for concurrency, adding threads
within processes to increase concurrency.
The initial starting point is to assign many active objects to a single operating system task or thread, using a
purpose-built active object scheduler - this way it is usually possible to achieve a very lightweight simulation of
concurrency, although, with a single operating system task or thread, it will not be possible to take advantage of
machines with multiple CPUs. The key decision is to isolate blocking behavior in separate threads, so that
blocking behavior does not become a bottleneck. This will result in a separation of active objects with blocking
behavior into their own operating system threads.
In real-time systems, this reasoning applies equally to capsules - each capsule has a logical thread of control, which
may or may not share an operating system thread, task or process with other capsules.
Unfortunately, like many architectural decisions, there are no easy answers; the right solution involves a carefully
balanced approach. Small architectural prototypes can be used to explore the implications of a particular set of
choices. In prototyping the process architecture, focus on scaling the number of processes up to the theoretical
maximums for the system. Consider the following issues:
-
Can the number of processes be scaled up to the maximum? How far beyond the maximum can the system be pushed? Is
there allowance for potential growth?
-
What is the impact of changing some of the processes to lightweight threads which operate in a shared process
address space?
-
What happens to response time as the number of processes are added? As the amount of inter-process communication
(IPC) is increased? Is there noticeable degradation?
-
Could the amount of IPC be reduced by combining or reorganizing processes? Would such a change result in large
monolithic processes which are difficult to load-balance?
-
Can shared memory be used to reduce IPC?
-
Should all processes get "equal time" when time resources are allocated? Is it possible to carry the time
allocation? Are there potential draw-backs to changing the scheduling priorities?
Active objects can communicate with each other synchronously or asynchronously. Synchronous communication is useful
because it can simplify complex collaborations through strictly controlled sequencing. That is, while an active object
is executing a run-to-completion step that involves synchronous invocations of other active objects, any concurrent
interactions initiated by other objects can be ignored until the full sequence is completed.
While this is useful in some cases, it can also be problematic since it can happen that a more important high-priority
event may have to wait (priority inversion). This is exacerbated by the possibility that the synchronously invoked
object may itself be blocked waiting on a response to a synchronous invocation of its own. This can lead to unbounded
priority inversion. In the most extreme case, if there is circularity in the chain of synchronous invocations, it can
lead to deadlock.
Asynchronous invocations avoid this problem enabling bounded response times. However, depending on the software
architecture, asynchronous communication often leads to more complex code since an active object may have to respond to
several asynchronous events (each of which might entail a complex sequence of asynchronous interactions with other
active objects) at any time. This can be very difficult and error prone to implement.
The use of an asynchronous messaging technology with assured message delivery can simplify the application programming
task. The application can continue operation even if the network connection or remote application is unavailable.
Asynchronous messaging does not preclude using it in a synchronous mode. Synchronous technology will require a
connection to be available whenever the application is available. Because a connection is known to exist, handling
commit processing may be easier.
In the approach recommended in the Rational Unified Process for real-time systems, capsules
communicate asynchronously through the use of signals,
according to particular protocols. It is possible, nevertheless to achieve synchronous
communication through the use of signal pairs, one in each direction.
Although the context-switching overhead of active objects may be very low, it is possible that some applications may
still find that cost unacceptable. This typically occurs in situations where large amounts of data need to be processed
at a high rate. In those cases, we may have to fall back to using passive objects and more traditional (but higher
risk) concurrency management techniques such as semaphores.
These considerations, however, do not necessarily imply that we must abandon the active object approach altogether.
Even in such data-intensive applications, it is often the case that the performance sensitive part is a relatively
small portion of the overall system. This implies that the rest of the system can still take advantage of the active
object paradigm.
In general, performance is only one of the design criteria when it comes to system design. If the system is complex,
then other criteria such as maintainability, ease of change, understandability, etc. are equally if not even more
important. The active object approach has a clear advantage since it hides much of the complexity of concurrency and
concurrency management while allowing design to be expressed in application-specific terms as opposed to low-level
technology-specific mechanisms.
Concurrent components with no interactions are an almost trivial problem. Nearly all of the design challenges have to
do with interactions among concurrent tasks, so we must first focus our energy on understanding the interactions. Some
of the questions to ask are:
-
Is the interaction one-directional, bi-directional, or multi-directional?
-
Is there a client-server or master slave relationship?
-
Is some form of synchronization required?
Once the interaction is understood, we can think about ways to implement it. The implementation should be selected to
yield the simplest design consistent with the performance goals of the system. Performance requirements generally
include both overall throughput and acceptable latency in the response to externally generated events.
These issues are even more critical for real-time systems, which are often less tolerant of variations in performance,
for example 'jitter' in response time, or missed deadlines.
It is bad practice to embed specific assumptions about external interfaces throughout an application, and it is very
inefficient to have several threads of control blocked waiting for an event. Instead, assign a single object the
dedicated task of detecting the event. When the event occurs, that object can notify any others who need to know about
the event. This design is based upon a well-known and proven design pattern, the "Observer" pattern [GAM94]. It can easily be extended for even greater flexibility to the
"Publisher-Subscriber Pattern," where a publisher object acts as intermediary between the event detectors and the
objects interested in the event ("subscribers") [BUS96].
Actions in a system may be triggered by the occurrence of externally generated events. One very important externally
generated event may be simply the passage of time itself, as represented by the tick of a clock. Other external events
come from input devices connected to external hardware, including user interface devices, process sensors, and
communication links to other systems. This is overwhelmingly true for real-time systems, which typically have high
connectivity with the outside world.
In order for software to detect an event, it must be either blocked waiting for an interrupt, or periodically checking
hardware to see if the event has occurred. In the latter case, the periodic cycle may need to be short to avoid missing
a short lived event or multiple occurrences, or simply to minimize the latency between the event's occurrence and
detection.
The interesting thing about this is that no matter how rare an event is, some software must be blocked waiting for it
or frequently checking for it. But many (if not most) of the events a system must handle are rare; most of the time, in
any given system, nothing of any significance is happening.
The elevator system provides many good examples of this. Important events in the life of an elevator include a call for
service, passenger floor selection, a passenger's hand blocking the door, and passing from one floor to the next. Some
of these events require very time-critical response, but all are extremely rare compared to the time-scale of the
desired response time.
A single event may trigger many actions, and the actions may depend upon the states of various objects. Furthermore,
different configurations of a system may use the same event differently. For example, when an elevator passes a floor
the display in the elevator cab should be updated and the elevator itself must know where it is so that it knows how to
respond to new calls and passenger floor selections. There may or may not be elevator location displays at each floor.
Polling is expensive; it requires some part of the system to periodically stop what it is doing to check to see if an
event has occurred. If the event must be responded to quickly, the system will have to check for event arrival quite
frequently, further limiting the amount of other work which can be accomplished.
It is far more efficient to allocate an interrupt to the event, with the event-dependent code being activated by the
interrupt. Though interrupts are sometimes avoided because they are considered "expensive", using interrupts
judiciously can be far more efficient than repeated polling.
Cases where interrupts would be preferred as an event-notification mechanism are those where event arrival is random
and infrequent, such that most polling efforts find that the event had not occurred. Cases where polling would be
preferred are those in which events arrive in a regular and predictable manner and most polling efforts find that the
event has occurred. In the middle, there will be a point at which one is indifferent to either polling or reactive
behavior - either will do equally well and the choice matters little. In most cases, however, given the randomness of
events in the real world, reactive behavior is preferred.
Broadcasting data (typically using signals) is expensive, and is typically wasteful - only a few objects may be
interested in the data, but everyone (or many) must stop to examine it. A better, less resource consumptive approach is
to use notification to inform only those objects who are interested that some event has occurred. Restrict broadcasting
to events which require the attention of many objects (typically timing or synchronization events).
More specifically:
-
Use passive objects and synchronous method invocations where concurrency is not an issue but instantaneous response
is.
-
Use active objects and asynchronous messages for the vast majority of application-level concurrency concepts.
-
Use OS threads to isolate blocking elements. An active object can be mapped to an OS thread.
-
Use OS processes for maximum isolation. Separate processes are needed if programs need to be started up and shut
down independently, and for subsystems which may need to be distributed.
-
Use separate CPUs for physical distribution or for raw horsepower.
Perhaps the most important guideline for developing efficient concurrent applications is to maximize the use of the
lightest weight concurrency mechanisms. Both hardware and operating system software play a major part in supporting
concurrency, but both provide relatively heavy-weight mechanisms, leaving a great deal of work to the application
designer. We are left to bridge a big gap between the available tools and the needs of concurrent applications.
Active objects help to bridge this gap by virtue of two key features:
-
They unify the design abstractions by encapsulating the basic unit of concurrency (a thread of control) which can
be implemented using any of the underlying mechanisms provided by the OS or CPU.
-
When active objects share a single OS thread, they become a very efficient, light-weight concurrency mechanism
which would otherwise have to be implemented directly in the application.
Active objects also make an ideal environment for the passive objects provided by programming languages. Designing a
system entirely from a foundation of concurrent objects without procedural artifacts like programs and processes leads
to more modular, cohesive, and understandable designs.
In most systems less than 10% of the code uses more than 90% of the CPU cycles.
Many system designers act as though every line of code must be optimized. Instead, spend your time optimizing the 10%
of the code that runs most often or takes a long time. Design the other 90% with an emphasis on understandability,
maintainability, modularity, and ease of implementation.
The non-functional requirements and the architecture of the system will affect the choice of mechanisms used to
implement remote procedure calls. An overview of the kinds of trade-offs between alternatives is presented
below.
Mechanism
|
Uses
|
Comments
|
Messaging
|
Asynchronous access to enterprise servers
|
Messaging middleware can simplify the application programming task by handling queuing, timeout and
recovery/restart conditions. You can also use messaging middleware in a pseudo-synchronous mode.
Typically, messaging technology can support large message sizes. Some RPC approaches may be limited in
message sizes, requiring additional programming to handle large messages.
|
JDBC/ODBC
|
Database calls
|
These are database-independent interfaces for Java servlets or application programs to make calls to
databases that may be on the same or another server.
|
Native interfaces
|
Database calls
|
Many database vendors have implemented native application program interfaces to their own databases
which offer a performance advantage over ODBC at the expense of application portability.
|
Remote Procedure Call
|
To call programs on remote servers
|
You may not need to program at the RPC level if you have an application builder that takes care of this
for you.
|
Conversational
|
Little used in e-business applications
|
Typically low-level program-to-program communication using protocols such as APPC or Sockets.
|
Many systems require concurrent behavior and distributed components. Most programming languages give us very little
help with either of these issues. We have seen that we need good abstractions to understand both the need for
concurrency in applications, and the options for implementing it in software. We have also seen that, paradoxically,
while concurrent software is inherently more complex than non-concurrent software, it is also capable of vastly
simplifying the design of systems which must deal with concurrency in the real world.
|