The Marionette remote control protocol lets out-of-process programs to communicate with Gecko.
The available instruction set is a superset of what is provided by the WebDriver wire protocol, providing some extra vendor-specific extension commands. The protocol does however not provide the transport mechanisms described in the W3C WebDriver specification, but instead sports a more light-weight keep-alive JSON-over-sockets approach.
The Marionette protocol has undergone a few revisions since it was first published. The official Marionette Python client is backwards compatible and does, at the time of writing, work with all published Gecko versions that have the Marionette service built in.
The server component resides inside the Firefox binary and can be enabled using the -marionette
flag or the marionette.defaultPrefs.enabled
preference. It will open a TCP socket server on the port specified in the marionette.defaultPrefs.port
preference, or default to port 2828. It will wait forever for a connection.
The server is asynchronous by design, but message indexing is not enforced in protocol levels 2 and earlier.
The client may be implemented in any language that is able to write and receive data over TCP sockets. A reference client is provided in-tree. Clients may be implemented both synchronously and asynchronously, although the latter is impossible in protocol levels 2 and earlier due to the lack of message indexing.
Changing APIs and the protocol
As part of adapting Marionette to being compliant with the WebDriver specification, we have to occasionally make interface- or API changes, and more rarely, changes to the data model of the TCP wire protocol. Managing these changes is hard work and requires forward- and backward thinking.
Because the Marionette Python client is being used for upgrade tests from an earlier Firefox version to the next, it needs to be compatible with all the Firefoxen that are currently in the release channels. The practical benefit of this is so that we can run the upgrade tests using the most recent client with all of its bells and whistles, against Firefox Stable when testing if it upgrades smoothly to Beta, Beta to Aurora, and so on.
The consequence is that we cannot change the server APIs or types in mozilla-central without making a corresponding backwards-compatibility patch to the client. The client will continue to speak to different Marionette servers, and it needs to know how to speak to the API as it were before you changed it if that is still available in Stable, Beta, or Aurora. Fortunately we can remove commands entirely and make any wild change we want in the server component, but we need to ensure we can retain the same functionality level in the cilent without breaking any of the API promises it currently offers.
Changing the protocol requires even more due diligence and must also involve bumping the Marionette protocol level.
Connection and message formatting
When the client connects to the server’s socket, a new keep-alive connection is established. The server will immediately send the client this message:
{ "applicationType": "gecko", "marionetteProtocol": version }
- version
- The version field indicates what protocol level the server speaks. Unfortunately it is not possible to up- or downgrade the protocol the server speaks at runtime.
The client should use this information to fall into the compatibility layer that is necessary to speak to the server. If the client does not implement the necessary protocol level, it must disconnect.
Each message that is sent across the protocol should be prepended with the number of characters of the message (non-inclusive), followed by a ":", and then by the message:
18:{"value":"foobar"}
The client is expected to read a few characters, search for the ":" (colon), then receive the rest of the message, subtracting the number of overread characters. This part of the protocol is so far not very well defined.
Protocol Level 3
Protocol level 3 introduced the concept of message sequencing to Marionette. Message sequencing allows Marionette to provide an asynchronous, parallel pipelining user-facing interface, limit chances of payload race conditions, and remove stylistic inconsistencies in how commands and responses are dispatched internally.
Clients that deliver a blocking WebDriver interface are still be expected to not send further command requests before the response from the last command has come back, but if they still happen to do so because of programming error or otherwise, no harm will be done. This guards against bugs such as bug 1207125.
This patch formalises the command and response concepts, and applies these concepts to emulator callbacks. Through the new message format, Marionette is able to provide two-way parallel communication. In other words, the server will be able to instruct the client to perform a command in a non ad-hoc way.
runEmulatorCmd and runEmulatorShell are both turned into command instructions originating from the server. This resolved a lot of technical debt in the server code because they are no longer special-cased to circumvent the dispatching technique used for all other commands; commands may originate from either the client or the server providing parallel pipelining enforced through message sequencing:
client server | | msgid=1 |----------->| | command | | | msgid=2 |<-----------| | command | | | msgid=2 |----------->| | response | | | msgid=1 |<-----------| | response | | |
The protocol now consists of a command message and the corresponding response message. A response message must always be sent in reply
to a command message.
This means that the server implementation does not need to send the reply precisely in the order of the received commands. If it receives multiple messages, the server may even reply in a random order. It is therefore strongly advised that client implementations take this into account when implementing the client end of this wire protocol.
This is required for pipelining messages. On the server side, some functions are fast, and some less so. If the server must reply in orer, the slow functions delay the other replies even if its execution is already completed.
Command Message
The request message is a four element JSON array as shown below:
[type, msgid, command, parameters]
- type
- Must be 0 (integer). This indicates that the message is the command message.
- msgid
- A 32-bit unsigned integer. This number is used as a sequencing number that uniquely identifies a pair of command and response messages. The other remote part will reply with a corresponding response with the same message ID.
- command
- A string identifying the RPC method or command to execute.
- parameters
- An arbitrary JSON serialisable object.
Response Message
The response message is a four element array as shown below:
[type, msgid, error, result]
- type
- Must be 1 (integer). This indicates that the message is the response message.
- msgid
- A 32-bit unsigned integer. This correspnds to the command message’s msgid.
- error
- If the command executed correctly, this field is null. If the error occurred on the server-side, then this field is an error object.
- result
- The result object associated with the command if the command executed correctly. If an error occurred on the server-side, this field is null.
The structure of the result entry can vary, but it generally follows the return types given by the WebDriver specification. Strings, numbers, null/undefined, and booleans are wrapped in value objects. Other complex types such as arrays and dictionaries/objects are returned directly.
The historical backdrop to this is that primitives are by some JSON parsers not considered JSON serialisable, and that this was the format of protocol level 2. This means that the result data remains exactly the same for both protocol level 2 and 3, except for the extra metadata level 3 provides.
Error Object
An error object is a serialisation of a WebDriver error, and is structured like this:
{ "error": "invalid session id", "message": "No active session with id 1234", "stacktrace": "" }
All the fields of the error object are required per the WebDriver specification, so the stacktrace and message fields may be empty strings. The error field is on the other hand guaranteed to be one of the JSON Error Codes defined in WebDriver.
Value Object
A value object is a JSON Object containing a single key value
with a JSON serialisable value, like this:
{"value": "foo"}
List of Commands
A full list of commands is enumerated in the GeckoDriver#commands dictionary in testing/marionette/driver.js.
Protocol Level 2
Protocol level 2 did not have message sequencing as laid out in the previous section, and return values from commands were instead returned directly. This means there is no distinction between for example value responses and error responses. The client can instead test for the existence of the error
key which will assist in detecting whether an error occurred. For other response types the command will need to know the deserialisation of the command.
Generally commands that return single values, such a string, a number, or a boolean are wrapped in value objects. Complex objects such as arrays and dictionaries/objects are left untouched. This remains true also for the result field of the 3rd array entry of the response message.
Protocol Level 1
TODO