Making a mach server

(This is a sort of simplified retelling of http://fdiv.net/2011/01/14/machportt-inter-process-communication with some added lore. You should probably start there.)

Much of the magic behind macOS and iOS (and all the derived xOSs - watch, appleTV, the stripped down versions in cables and touch bars and wherever else these things invariably end up) is performed with the assistance of other processes (including the kernel). Communicating between them is accomplished by various ways, but for macOS, the underlying mechanism is a client/server IPC mechanism powered by Mach, and the underlying currency is the mach port (mach_port_t). Writing a properly behaving, secure, robust mach server is extraordinarily difficult. This difficulty is compounded by the fact that documentation hails from the late 1980’s, and because low-level systems level management is “boring” to modern hack-and-slash software engineers who Want To Get Things Done. This spawned a few cross-process false-starts (CFMachPortRef, NSMachPort, Distributed objects, and some other curiosities), ultimately culminating in XPC. I haven’t used XPC at all, and am only glancingly familiar with how to use it, so that’s all beyond the scope of this story. But even that ultimately boils down to mach ports at the end of the day.

A mach port is sort of elusive. It doesn’t help that the people that understand them immediately go off the deep end when starting to explain things. In this context, a mach port is most similar to a file descriptor on the various UNIX flavors, specifically a socket or a pipe. They’re very configurable, so they can emulate a whole slew of behaviors, and they’re a precious commodity — there’s a very limited number available from the kernel, and when they’re exhausted, things generally go very badly. For the remainder of this story, I’ll be treating them similar to sockets. Like a network socket, if you’d like. As we saw earlier, mach ports can also be handles to resources. IOSurfaceRefs are backed by mach ports, as a real-life example. If you want to inspect the ports currently kicking around on your machine, do sudo lsmp -a and watch your eyes glaze over.

So, sockets. UNIX sockets have a bunch of behaviors that I’m honestly pretty rusty on. Network sockets work by port numbers, so if you know a number (like 80 for HTTP, or 22 for SSH, among thousands of others), you can at least check and see if a service is available. Mach services aren’t like this. They’re resolved by name (reverse-DNS, by convention). There’s surprisingly little cruft in both looking up a service, and in registering one as a server.

Here’s a simple stand-alone do-nothing server (actually doing stuff requires sending and receiving messages, and there are like 7 arguments to mach_msg, so we’ll get to that another time after the kids are in school so I can spend more than 15 seconds thinking about things before being interrupted to make One More Hot Dog). The cool part demonstrated here (for very limited values of cool) is name lookup, which works when our service is running, and doesn’t when it’s not. It’s like magic, and it Just Works! Eventually we’ll work ourselves up to using mig to generate interface stubs, but for now we’ll take it nice and easy.

On to the code:

#include <bootstrap.h>
#include <mach/mach.h>
#include <stdio.h>
#include <unistd.h>
 
int main(const int argc, char **argv)
{
    printf("bootstrap port: %d\n", bootstrap_port);
 
    if (argc == 1) { // "server" mode
        mach_port_t service_port = MACH_PORT_NULL;
 
        kern_return_t kr = 0;
 
        // bootstrap ports are beyond the scope of this story
        kr = bootstrap_look_up(bootstrap_port, "com.example.test", &service_port);
        printf("looked-up service_port: %d (%x)\n", service_port, kr);
 
        kr = bootstrap_check_in(bootstrap_port, "com.example.test", &service_port);
        printf("service_port: %d (%x)\n", service_port, kr);
 
        kr = bootstrap_look_up(bootstrap_port, "com.example.test", &service_port);
        printf("looked-up service_port: %d (%x)\n", service_port, kr);    
 
        printf("server loop\n");
        while(1)
            sleep(1);
    } else { // "client" mode
        mach_port_t service_port = MACH_PORT_NULL;
        kern_return_t kr = 0;
 
        kr = bootstrap_look_up(bootstrap_port, "com.example.test", &service_port);
        printf("looked-up service_port: %d (%x)\n", service_port, kr);
    }
 
    return 0;
}

Our throw-away service can be invoked with no arguments (server mode), or with any arguments at all (client mode). If we invoke it in client mode without the server running, we see this:

$ ./service -client
bootstrap port: 1799
looked-up service_port: 0 (44e)

I don’t know what 0x44E is off the top of my head. Error codes are dumb, so I just know that not-0 means not-good. Also, service_port is 0, which means bad (MACH_PORT_NULL). So … this didn’t do anything.

[Editor’s note: bootstrap_strerror() says 0x44E means “Unknown service name”.]

Now let’s fire up the server and see what happens.

$ ./service 
bootstrap port: 1799
looked-up service_port: 0 (44e)
service_port: 3331 (0)
looked-up service_port: 3331 (0)
server loop

Cool! As a self-test, server-mode does a resolve before registering the service (which fails just like the client did before), and then does it again after registering to show that it’s different. Now let’s try that client again while the server is still running.

$ ./service -client
bootstrap port: 1799
looked-up service_port: 4611 (0)

Also cool! It appears to work. As you’ll note, the looked-up service number is different. Mach port numbers are like file descriptors, in that they’re only unique within a process, and passing the number to another process doesn’t mean anything; actually passing a file descriptor (or socket, or mach port) needs some special handling, since port-space is per-process. bootstrap_port appears to be identical, perhaps because these were both run under the same user session and didn’t do anything fancy when starting up.

In Real Life, some of these interfaces are subject to races. launchd can be used to manage process lifetime stuff somewhat, and some SPIs like bootstrap_look_up2 and audit trailers and entitlements are used to verify that something fishy isn’t going on. I’ll have to dig through headers and see how much of that is public or publicly visible though.