Useful P Modelling Patterns
The P programming language is a state machine-based programming language which is specifically designed for modelling distributed systems. It is used to identify potential data consistency issues and system design failures quickly and cheaply. The project is documented at https://p-org.github.io/P/ (opens in a new tab). Whilst P is much more accessible to the average software developer than other formal modelling tools such as TLA+ (opens in a new tab), the tutorials provided don't cover all of the scenarios that you're likely to want to model when designing real-world systems. This post describes a couple of patterns that I've frequently found useful while working with P which are not included in the docs.
Pattern 1: Timeouts
The first pattern I want to share is one that you will need in almost every model of a distributed system: the network timeout. It's relatively simple:
if ($) {
updateState;
sendSuccessResponseEventToClientMachine;
} else {
if ($) {
updateState;
}
sendTimeoutResponseEventToClientMachine;
}There are two important things to mention about the code above. The first is that we must allow for the fact that the network timeout might occur on either the request or on the response. That's why, even in the "else" branch of our conditional, we might update the state of the machine serving the request. The request might succeed even if the response doesn't reach the client.
As an aside, I've worked on numerous systems where previous developers have wrapped a network request with a database transaction in the belief that this would ensure consistency between the database and the remote system. If they had modelled those systems using the code above, they'd have quickly realised that using database transactions in the way described provides no consistency guarantees at all.
The second interesting point from the pattern is that request timeouts will occur 50% of the time, as indicated by the use of P's special $ operator in the condition. This is, of course, an unrealistically high timeout frequency. It might seem that selecting a more realistic timeout frequency such as one in ten or one in twenty requests would be more reflective of "real" system behaviour. However, when working with P models, what we usually want to do is explore all of the different paths through the system's state machines as quickly as possible. Reducing the frequency of request timeouts in the model would only mean that we would need to run far more simulations using the P checker to get a confident result than we would if half of the requests time out.
Pattern 2: Lock Expiry
This is an important pattern when modelling any concurrent process requiring use of a lock, e.g. multiple users updating the same file. Locks must always have an expiry date associated with them, in case the lock holder is unable to perform its update and release the lock for some reason. Working out how to expire locks using P was a bit of a puzzle, so I'm sharing my solution. A little more code is required than for modelling the network timeouts.
The timer machine is based on the timer provided as part of the P tutorials, but there are a few differences. Firstly, the tutorial timer is a global timer, whereas our lock timer only applies to a single piece of state. Secondly, the timer from the tutorials can also enter a state in which it never times out. This probably isn't a scenario worth modelling in cases where the timer is running on the same machine as the main process because barring a total system crash we can be fairly confident the timer will eventually expire. If the timer is reached across a network boundary, then timer failure must, of course, be modelled.
Here's the code for the timer machine:
event eStartTimer;
event eCancelTimer;
event eDelayedTimeOut;
event eTimeOut: int;
machine Timer {
var client: machine;
var lockedId: int;
start state Init {
entry (payload: (_client : machine, _lockedId: int)) {
client = payload._client;
lockedId = payload._lockedId;
goto TimerStarted;
}
}
state TimerStarted {
entry {
if($) {
send client, eTimeOut, lockedId;
raise halt;
} else {
send this, eDelayedTimeOut;
}
}
on eDelayedTimeOut goto TimerDelayed;
}
state TimerDelayed {
entry {
if($) {
send client, eTimeOut, lockedId;
raise halt;
} else {
send this, eDelayedTimeout;
}
}
on eCancelTimer do {
raise halt;
}
}
}And here's how to use it inside the client machine:
var timer: Timer;
timer = new Timer((_client = this, _lockedId = someId));If the client machine performs its update successfully and releases the lock, then the timer should be cancelled:
send timer, eCancelTimer;There are a few important points to make regarding the timer design. The first is that the timer may time out immediately, which seems strange. However, it is important to consider cases such as a delay in the client process resulting in the timer effectively expiring immediately in your model. It's therefore essential that the timer might expire as soon as it is created.
Additionally, you may have noticed that instead of simply using the goto statement to move the timer to the TimerDelayed state, which is what we would usually do when changing the state of a machine internally, we send an eDelayedTimeout event. This gives the P scheduler time to process some other events before returning to the TimerDelayed state, rather than entering the state immediately, which is what you want from a non-blocking timer.
Conclusion
There are a lot more patterns that you'll need to build a complete set of P modelling primitives, but hopefully those described above will come in handy. Personally, I feel that system modelling using formal methods is an under-leveraged practice in the software industry in general. I hope this article will help a few folks to start making useful system models with P.
© Ryan Brown.RSS