Thomas Huining Feng
MSDL, School of Computer Science, McGill
http://msdl.cs.mcgill.ca/people/tfeng/
hfeng2@cs.mcgill.ca
To view the SVM page for more information on the latest version, click here.
Realtime Simulation (Execution)1 | Scaled-time Simulation2 | As-fast-as-possible Simulation3 | |
Single-machine | |||
Distributed | timewarp (in process) |
Timewarp-enabled Distributed SVM uses the optimistic scheme to implement timewarp simulation. Every component is a model of the extended statechart formalism in its own right, running in a standalone SVM instance. This is to run multiple logical processes (LP) in multiple processes. Running multiple logical processes sequentially is done by model importation4. Running multiple logical processes in multiple threads will be studied in the future.
Sequential | multi-threaded | multi-processed | |
LP (tight coupling) | future study | impossible | |
LP (loose coupling) | future study | future study | timewarp (in process) |
SVM instances may reside on the same machine or be distributed across the network. Two libraries are built to support this distributed computation: one based on PVM, the other based on CORBA. The choice between them is left to the designer.
Every component is allowed to advance as far as possible, unless something is noticed to be wrong in the system: i.e., a component receives a message from another component, which is tagged with a time in its past. In this case, the receiver component must rollback to EXACTLY5 the message time, and all the components which it contacts after that time must also rollback accordingly. (Those components may request more rollbacks in turn.)
where |
is the component local time of component ; |
is the delivery time of message , which is sent out by the sender but has not yet been received (on the way). |
has a special implication. All behavior in its past can be forgotten safely, since the system will never rollback to a time less then the .
It is not easy to determine . One important reason is that it takes variable time to transfer a message. I.e., is hard to determine, especially for a design with a separate Clock component (discussed in the latter part).
This section discusses some preparations for timewarp simulation. Though they are crucial for timewarp, they can be also used for other purposes.
One issue of distributed simulation is how to locate different components and connect them. These components may run on a single machine or be distributed across a network. Their location may be statically decided at design time. But in most cases this is not practical. The designer is seldom interested in specifying this physical deployment at design time. Moreover, it is also desirable to run the same model on different physical platforms with different deployment schemes, and compare the outcome.
A more flexible approach is to dynamically locate the required components. The tester can thus plug-and-play different components at run-time. When a component is executed, it may require the functionality of some other components. Once it finds these required components, it uses the mechanisms provided by the SVM environment to establish necessary connections between ports. Each pair of connected ports represents a communication channel, which is set up by one component and passively accepted by the other.
To locate a component, there are three schemes, which can also combine to form more sophisticated query:
A DNS (Dynamic Naming Service) server runs in the background, which records information of all the running components. When a component starts, its SVM solver registers itself and also queries the DNS server for required components. When those components are found, direct connections between them are established, so that it can call the functionality of them. Those components can also call its functionality, provided that the connections allow incoming calls.
Two approaches are possible for a rollback: undoing all the changes from the last checkpoint or restoring a complete snapshot, which is taken at the checkpoint.
For an FSA model, both the approaches are feasible. Either all the state changes after the checkpoint are undone, or the old state which is recorded at the checkpoint is restored. However, for an Extended Statechart model, it is not easy to undo all the changes, since the output actions may modify arbitrary variables.
For the undoing approach, a C++ implementation is available here (the project report by Shuqing Wu and me).
SVM takes the snapshotting approach by making it possible to snapshot any model at any time in its execution. The snapshot result is a string which records all the current states of the model, including user-defined variables and objects. For a simple model, a snapshot takes about 6000 bytes of memory.6
Each event is allowed to carry a parameter. If the parameter is a list or tuple, it can be imagined that an arbitrary number of parameters are passed with the event.
The meaning of the parameters are a decision of the designer. In the output of a transition, the parameter accompanying the current event can be retrieved. In the guard, such a parameter can also be used to determine run-time condition.
For distributed simulation, one SVM instance automatically serialize the parameters before sending them to another one (possibly via a network). The serialization is recursive if the parameters are tuples or objects. The SVM instance on the other side automatically reconstructs the parameters from the serialization strings before passing them to the running component.
Automatic serialization is useful for a lot of applications. For SVM, I implemented it in Python (or Jython), which, like Java, allows dynamic reflection into an object. Another implementation, which does not use any reflection, is built in C++. The readers can find it here (the project report by Shuqing Wu and me).
In theory, every message is sent asynchronously. The sender dispatches the message, which may be transfered via the underlying PVM or CORBA to the other side, and then immediately continues to do other jobs. Neither does it wait for a reply from the callee. On top of PVM or CORBA, it is guaranteed that the callee will receive all the messages eventually and in their order. However, the transfer time is unpredictable.
In practice, asynchronous calls complicate model design, since, after sending a message, if a reply is expected, the model must remember this state. If communication between models are common, the state space are increased dramatically.
SVM supports built-in synchronous calls. Synchronous calls can be used in the output, which is similar to sending an external event. Suppose event is sent via output port . The external event is written as . Also suppose an event should be received from input port afterward. The synchronous call must specify both and . When this call is being executed, the component is blocked until is actually received from .
It is the designer's responsibility to make sure that the reply will be received in the future. Otherwise the component will be blocked forever. There is no way to awake it, because an output part is executed as a critical section and global locks are acquired for it, which also block other concurrent threads. If a model cannot leave an output part, it is dead.
Like some other extensions, synchronous calls do not add functionality to statechart. They are just a shorthand notation for practical purpose.
A message must be tagged with the delivery time. When the receiver receives a message, it uses the Time tag to determine which of the following three actions are taken:
A message also carries a Sender tag, which gives the global ID of its sender (for PVM implementation, the ID is a PVM ID; for CORBA implementation, the ID is an IDL ID). This tag is useful especially when the behavior of the receiver depends on the sender of the message, or when the receiver must send back a result to the sender.
Theoretically speaking, an output port of message sender is connected to one or more input ports of its receivers. When it sends a message to this output port, all the connected receivers should receive a copy of the message, whether they are interested in it or not.
However, in practice it is usually desirable to explicitly send a message to only part of those receivers. Loads on the network are greatly decrease. An example is the Clock component discussed in the next section. It has a timewarp port supporting timewarp simulation. Every other component is connected to this port. It is not practical to broadcast a message to all other components while only one or two of them will actually handle it.
With the Sender tag, the receiver can determine where a message comes from, and explicitly reply to it later.
A special component, Clock, manages time advance in the whole system. Other components cannot advance their local time unless they receive a notification from the Clock. The Clock also broadcasts the global time whenever it is changed.
The Clock maintains the local times of all the other components. Those components cannot assume their time without synchronization with the Clock.7
To retrieve its local time, the component uses a synchronous call:
(Suppose this component has an in/out port called timewarp. It is connected to the in/out port timewarp of the Clock component.)
Since a component has to get its time frequently with this synchronous call, the performance of the Clock is crucial. If the Clock component crashes, all the other components are blocked, because the reply of the synchronous call will never be returned.
On the Clock side, it must handle this time query event no matter which state it is in (the Clock itself is also a component written in the DES formalism). The following transition is taken from its description file:
|
(This transition example also shows that the textural model description is 1:1 representation of the graphical semantics.)
A component asynchronously schedules events in its future by sending message schedule. to the Clock (the port is schedule, the event name is omitted). The scheduled time is a parameter sent with the message.
The Clock should not immediately notify the component, which may be still busy handling events at its current local time. It keeps this scheduling message in a priority queue. (There is a priority queue ordered by scheduled time for each connected component.)
When a component finished handling all the events at the current time (i.e., its current queue becomes empty), it synchronously sends message timewarp.notifyidle to the Clock, and waits for reply timewarp.goahead.
On receiving timewarp.notifyidle, the Clock checks if the idle component has scheduled an event in its future. If so, it does the following things:
If the idle component has not scheduled any event, the Clock simply sends timewarp.goahead to it. It will be truly idle and waiting for later incoming events.
Rollback is an important issue in timewarp.
It is important for a component to know that a message it receives is tagged with a time in its past. In this case, it stops the event handler immediately and sends message timewarp.timeskew to the Clock.
As to solve a time skew, a rollback of this component is necessary. Its rollback also affects other components, which have received or will receive its messages sent after the rollback time. So the component must remember all its influencees when sending out messages. It sends the set of affected influencees with message timewarp.timeskew to the Clock.
The rollback time is also sent as another parameter.
When it receives the timewarp.timeskew message, the Clock uses a
set to record all affected components. Another set is used
to record rollbacked components, which is initialized to be empty:
The Clock broadcasts timewarp.rollback to all the components in , including the one that first discovers time skew. The rollback time is sent as a parameter.
Every component that receives this rollback message must immediately rollback to exactly the rollback time. All the events received after the rollback time which are sent from affected components are deleted. This is because those senders rollback to a time when these messages are not sent yet. However, events from other components which are not affected should be kept and handled later.
When it is done, it sends back a timewarp.rollbacked message to the Clock, and waits for timewarp.goahead message. It should not immediately start handling new events, because other affected components may not have rollbacked at this time and are still sending out erroneous messages.
A fact is very important: these rollbacked components may also send back their influencees. The Clock listens for timewarp.rollback messages. If more affected components are found, they will be added to and timewarp.rollback will be sent to them.
After broadcasting rollback messages, the Clock listens for timewarp.rollbacked. When this message is received, the sender is added to . If more affected components are returned, they are added to .
When =, the rollback process is complete. The Clock broadcasts timewarp.goahead message to all the components in .
From version 0.3, SVM (and jSVM implemented in Jython) will have timewarp support that makes building timewarp simulation easy. Changing a simulation of other kinds to timewarp only needs to add a few deployment keywords to some of the participating components.
Current stable version is 0.23, which is available here.
The new version will be released later this summer.
Special thanks to Prof. Hans Vangheluwe for his excellent supervision and advice!