
Structure of monolithic and microkernel-based operating systems
A 'microkernel' is a minimal
computer operating system kernel which, in its purest form, provides no operating-system services at all, only the ''mechanisms'' needed to implement such services, such as low-level
address space management,
thread management, and
inter-process communication (IPC). The microkernel is the only part of the system executing in a
kernel mode. The actual operating-system services are provided by
user-mode ''servers''. These include
device drivers,
protocol stacks,
file systems and
user-interface code.
This results in a system structure that is drastically different from the more established
monolithic kernels. The latter traditionally have a vertically-layered structure, where applications obtain services by performing a specific
system call for each service. In contrast, a microkernel-based system features a horizontal structure, where system services are obtained by executing an IPC system call addressed to a particular server.
Microkernels are closely related to
exokernels. They also have much in common with
hypervisors, but the latter make no claim to minimality, and are specialized to supporting
virtual machines. The
L4 microkernel is frequently used as a hypervisor, which indicates that a microkernel is a possible implementation of a hypervisor. The term
nanokernel is historically used to differentiate from earlier microkernels which contained actual system services, but the ''minimality'' principle used by
Jochen Liedtke in the design of the
L4 microkernel implies that these terms have the same meaning; microkernel is the modern terminology.
Introduction
Early operating system kernels were rather small, partly because computer memory was limited. As the capability of computers grew, the number of devices the kernel had to control also grew. Early versions of
UNIX had kernels of quite modest size, even though those kernels contained device drivers and file system managers. When address spaces increased from 16 to 32 bits, kernel design was no longer cramped by the hardware architecture, and kernels began to grow. (See
History of Unix).
Berkeley UNIX (
BSD) began the era of big kernels. In addition to operating a basic system consisting of the CPU, disks and printers, BSD started adding additional
file systems, a complete
TCP/IP networking system, and a number of "virtual" devices that allowed the existing programs to work invisibly over the network. This growth continued for several decades, resulting in kernels with millions of lines of
source code. As a result of this growth, kernels were more prone to bugs and became increasingly difficult to maintain.
The microkernel was designed to address the increasing growth of kernels and the difficulties that came with them. In theory, the microkernel design allows for easier management of code due to its division into
user-space services. This also allows for increased security and stability resulting from the reduced amount of code running in
kernel mode.
For example, if a networking service crashed due to
buffer overflow, only the networking service's memory would be corrupted, leaving the rest of the system still functional. On a traditional monolithic kernel, the overflow could possibly corrupt the memory of other
drivers and possibly the kernel itself, which could crash the entire system.
Inter-process communication
IPC is any mechanism which allows separate processes to communicate with each other, usually by sending
messages. This allows the operating system to be built of a number of small programs called servers, which are used by other programs on the system. Most or all support for peripheral hardware is handled in this fashion, with servers for networking, file systems, graphics, etc.
Jochen Liedtke in his
L4 microkernel pioneered techniques that lead to an order-of-magnitude reduction of IPC costs.
[1] These include an IPC system call that supports a send as well as a receive operation, making all IPC being synchronous, in order to avoid the overhead of buffering in the kernel and multiple copying, and passing as much data as possible in registers. Furthermore, Liedtke introduced the concept of the ''lazy process switch'', where during an IPC execution an (incomplete)
context switch is performed from the sender directly to the receiver. If, as in L4, part or all of the message is passed in registers, this transfers the in-register part of the message without any copying at all. Furthermore, the overhead of invoking the scheduler is avoided; this is especially beneficial in the common case where IPC is used in an
RPC-type fashion by a client invoking a server. Another optimization, called ''lazy scheduling'', avoids traversing scheduling queues during IPC by leaving threads that block during IPC in the ready queue. Once the scheduler is invoked, it moves such threads to the appropriate waiting queue. As in many cases a thread gets unblocked before the next scheduler invocation, this approach saves significant work. Similar approaches have since been adopted by
QNX and
Minix 3.
With asynchronous messaging, the message sender places data on a queue. The message sender is not blocked but continues to run when sending a message, unless the queue is full. This requires buffering in the kernel, which means that messages are copied twice (sender to kernel and kernel to receiver). The
Berkeley sockets model from the UNIX world, which follows the earlier UNIX byte-stream
pipe mechanism, fits this model.
POSIX adds asynchronous message queues, which queue and send discrete messages.
[1]
In a client-server system, most communication is essentially synchronous, even if using asynchronous primitives, as the typical operation is a client invoking a server and then waiting for a reply. As it also lends itself to more efficient implementation, modern microkernels (starting with
L4 and including
Minix 3) only provide a synchronous IPC primitive. Asynchronous IPC can be implemented on top by using helper threads.
Servers
Microkernel servers are programs like any others, except that the kernel grants some of them privileges to interact with parts of physical memory that are otherwise off limits to most programs. This allows some servers, particularly device drivers, to interact directly with hardware.
A basic set of servers for a general-purpose microkernel includes file system servers, device driver servers, networking servers, display servers, and user interface device servers. This set of servers (drawn from
QNX) provides roughly the set of services offered by a monolithic UNIX kernel. The necessary servers are started at system startup and provide services, such as file, network, and device access, to ordinary application programs. With such servers running in the environment of a user application, server development is similar to ordinary application development, rather than the build-and-boot process needed for kernel development.
Additionally, many "crashes" can be corrected for by simply
stopping and restarting the server. (In a traditional system, a crash in any of the kernel-resident code would result in the entire machine crashing, forcing a reboot). However, part of the system state is lost with the failing server, hence this approach requires applications to cope with failure. A good example is a server responsible for
TCP/IP connections: If this server is restarted, applications will experience a "lost" connection, a normal occurrence in networked system. For other services, failure is less expected and may require changes to application code. For QNX, restart capability is offered as the
QNX High Availability Toolkit.
In order to make all servers restartable, some microkernels have concentrated on adding various
database-like techniques like
transactions,
replication and
checkpointing in order to preserve essential state across single server restarts. An example is
ChorusOS, which was targeted at high-availability applications in the
telecommunications world. Chorus included features to allow any "properly written" server to be restarted at any time, with clients using those servers being paused while the server brought itself back into its original state. However, such kernel features are incompatible with the minimality principle, and are therefore not provided in modern microkernels, which instead rely on appropriate user-level protocols.
Device Drivers
Device drivers frequently perform
direct memory access (DMA), and therefore can write to arbitrary locations of physical memory, including over kernel data structures. Such drivers must therefore be trusted. It is a common misconception that this means that they must be part of the kernel. In fact, a driver is not inherently more or less trustworthy by being part of the kernel.
While running a device driver in user mode does not necessarily reduce the damage a misbehaving driver can cause, in practice it is beneficial for system stability in the presence of buggy (rather than malicious) drivers: memory-access violations by the driver code itself (as opposed to the device) may still be caught by the memory-management hardware. Furthermore, many devices are not DMA-capable, their drivers can be made untrusted by running them in user mode. Recently, an increasing number of computers feature
IOMMUs, many of which can be used to restrict a device's access to physical memory.
[2] IBM mainframes have had IO MMUs since the
IBM System/360 Model 67 and
System/370. This also allows user-mode drivers to become untrusted.
Historically, drivers were less of a problem, as the number of devices was small and trusted anyway, so having them in the kernel simplified the design and avoided potential performance problems. This led to the traditional driver-in-the-kernel style of UNIX, Linux, and Windows.
[3]
With the proliferation of various kinds of peripherals, the amount of driver code escalated and in modern operating systems dominates the kernel in terms of code size.
User-mode drivers predate microkernels. The
Michigan Terminal System (MTS), in 1967, supported user-space drivers, the first operating system to be designed with that capability.
[4]
Essential components
As a microkernel must allow building arbitrary operating-system services on top, it must provide some core functionality. At the least this includes:
★ some mechanisms for dealing with
address spaces — this is required for managing memory protection;
★ some execution abstraction to manage CPU allocation — typically
threads or
scheduler activations; and
★
inter-process communication — required to invoke servers running in their own address spaces.
This minimal design was pioneered by
Brinch Hansen's
Nucleus and the hypervisor of IBM's
VM. It has since been formalised in Liedtke's ''minimality principle'':
A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e., permitting competing implementations, would prevent the implementation of the system's required functionality.[5]
Everything else can be done in a user program, although device drivers implemented as user programs may require special privileges to access I/O hardware.
Related to the minimality principle, and equally important for microkernel design, is the
separation of mechanism and policy, it is what enables the construction of arbitrary systems on top of a minimal kernel. Any policy built into the kernel cannot be overwritten at user level and therefore limits the generality of the microkernel.
[6]
Policy implemented in user-level servers can be changed by replacing the servers (or letting the application chose between competing servers offering similar services).
For efficiency, most microkernels contain schedulers and manage timers, in violation of the minimality principle and the principle of policy-mechanism separation.
Start up (
booting) of a microkernel-based system requires
device drivers, which are not part of the kernel. Typically this means that they are packaged with the kernel in the boot image, and the kernel supports a bootstrap protocol that defines how the drivers are located and started. Some microkernels simplify this by placing some key drivers inside the kernel (in violation of the minimality principle),
LynxOS and the original
Minix are examples. Some even include a
file system in the kernel to simplify booting.
A key component of a microkernel is a good
IPC system. Since all services are performed by usermode programs, efficient means of communication between programs are essential, far more so than in monolithic kernels. The design of the IPC system makes or breaks a microkernel. To be effective, the IPC system must not only have low overhead, but also interact well with CPU scheduling.
Performance
Obtaining a service is inherently more expensive in a microkernel-based system than a monolithic system. In the monolithic system, the service is obtained by a single system call, which requires two ''mode switches'' (changes of the processor's privilege level). In the microkernel-based system, the service is obtained by sending an IPC message to a server, and obtaining the result in another IPC message from the server. This requires two system calls (a total of four mode switches) plus two
address-space switches, as the server is typically in a different address space from the client. In addition, passing actual data to the server and back may incur extra copying overhead, while in a monolithic system the kernel can directly access the data in the client's buffers.
Performance is therefore a potential issue in microkernel systems. Indeed, the experience of first-generation microkernels such as
Mach and Chorus showed that systems based on them performed very poorly.
[7]
However,
Jochen Liedtke showed that Mach's performance problems were the result of poor design and implementation, and specifically Mach's excessive
cache footprint.
[8]
Liedtke demonstrated with his own
L4 microkernel that through careful design and implementation, and especially by following the minimality principle, IPC costs could be reduced by more than an order of magnitude compared to Mach. L4's IPC performance is unbeaten across a range of architectures.
[9][10][11]
While these results demonstrate that the poor performance of systems based on first-generation microkernels are not representative for second-generation kernels such as L4, this constitutes no proof that microkernel-based systems can be built with good performance. It has been shown that a monolithic Linux server ported to L4 exhibits only a few percent overhead over native Linux.
[12]
However, such a single-server system exhibits few, if any, of the advantages microkernels are supposed to provide by structuring operating-system functionality into separate servers.
A number of commercial multi-server systems exist, in particular the realtime systems
QNX and
Integrity. No comprehensive comparison of performance relative to monolithic systems has been published for those multiserver systems. Furthermore, performance does not seem to be the overriding concern for those commercial systems, which instead emphasize simplicity for the sake of robustness. An attempt to build a high-performance multiserver operating system was the IBM Sawmill Linux project.
[13]
However, this project was never completed.
It has been shown in the meantime that user-level device drivers can come close to the performance of in-kernel drivers even for such high-throughput, high-interrupt devices as Gigabit Ethernet.
[14] This seems to imply that high-performance multi-server systems are possible.
Security
In the security context, the minimality principle of microkernels is a direct consequence of the principle of
least privilege, which means that all code should only have the minimal amount of privileges it requires to provide its required functionality. Consequently, microkernel designs have been used for systems designed for high-security applications, including
EROS,
KeyKOS and military systems.
In 2006 the debate about the potential security benefits of the microkernel design increased.
[15]
Many attacks on computer systems take advantage of bugs in various pieces of software. For instance, one of the common attacks is the
buffer overflow, in which malicious code is "injected" by asking a program to process some data, and then feeding in more data than it stated it would send. If the receiving program does not specifically check the amount of data it received, it is possible that the extra data will be blindly copied into the receiver's memory. This code can then be run under the permissions of the receiver. This sort of bug has been exploited repeatedly, including a number of recent attacks through
web browsers.
To see how a microkernel can help address this, first consider the problem of buffer overflow bugs in device drivers. Device drivers are notoriously buggy,
[16] but nevertheless run inside the kernel of a traditional operating system, and therefore have "superuser" access to the entire system.
[17] Malicious code exploiting this bug can thus take over the entire system, with no boundaries to its access to resources.
[18] For instance, under
open-source monolithic kernels such as
Linux or the
BSDs a successful attack on the networking stack over the Internet could proceed to install a
backdoor that runs a service with arbitrarily high privileges, so that the intruder may abuse the infected machine in any way
[19] and no security check would be applied because the
rootkit is acting from inside the kernel. Even if appropriate steps are taken to prevent this particular attack,
17 the malicious code could simply copy data directly into other parts of the kernel memory, as it is shared among all the modules in the kernel.
A microkernel system is somewhat more resistant to these sorts of attacks
[20]
for three reasons. For one, an identical bug in a server would allow the attacker to take over only that program, not the entire system: in other words, microkernel designs obey the
principle of least authority. This isolation of "powerful" code into separate servers helps isolate potential intrusions, notably as it allows a CPU's
memory management unit to check for any attempt to copy data between the servers.
Microkernels also tend to run device-driver processes and server processes with user-mode CPU privileges. While supervisor-mode code can perform any operation the hardware can, including writing to write-protected memory, changing the CPU's fundamental data tables and switching to arbitrary address-spaces, user-mode code can only perform those operations deemed safe for application code. So device-driver and server processes running in user-mode under a microkernel system must ask the kernel to perform privileged operations for them, allowing the microkernel to check for safety and security.
But the most important reason for the additional security is that the servers are isolated in smaller code libraries, with well defined interfaces. That means that the code can be audited, as its smaller size makes this easier to do (in theory) than if the same code was simply one module in a much larger system. This may also result in fewer non-security-related bugs, improving overall stability.
Key to the argument is the fact that a microkernel, as a rule, isolates high-privilege code in protected memory because it runs in a separate server. This isolation could likely be applied to a traditional kernel as well. However, it is precisely this mechanism that forces data to be passed around between programs, leading to the microkernel's performance difficulties discussed above. In the past, outright performance was the main concern of most programs. Today this is no longer quite as powerful an argument as it once was, as security problems become endemic in a well-connected world.
[21]
Finally, it should be noted that securing the kernel, although a necessary condition,
[22] is not sufficient to guarantee overall ''system'' security. For instance, if a bug remained in the system's web browser that allowed attack, some
shellcode uploaded through that attack could still legally ask the file system to erase all the browser owner's files via the normal IPC messages. Securing against these sorts of "reasonable requests" is considerably more difficult and requires applying the principle of least authority in the design of the ''entire'' operating system, not just the (micro)kernel. The
EROS microkernel operating system, and its descendants
CapROS and
Coyotos, are research projects that strive to do just that.
See also
★
Kernel (computer science)
★
★
Exokernel, a research kernel architecture with a more minimalist approach to kernel technology.
★
★
Hybrid kernel
★
★
Monolithic kernel
★
★
Nanokernel
★
Trusted computing base
Notes
{{FootnotesSmall|resize=
Further reading
★
scientific articles about microkernels (on
CiteSeer), including:
★
★
An Architectural Overview of QNX, Dan Hildebrand, , , Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures, 1992 - the basic QNX reference.
★
★
Can We Make Operating Systems Reliable and Secure?, Tanenbaum, A., Herder, J. and Bos, H., , , Computer, May 2006 -the basic reliable reference.
★
★
Microkernel Operating System Architecture and Mach, Black, D.L., Golub, D.B., Julin, D.P., Rashid, R.F., Draves, R.P., Dean, R.W., Forin, A., Barrera, J., Tokuda, H., Malan, G., and Bohman, D., , , J. of Information Processing, March 1992 - the basic Mach reference.
★
MicroKernel page from the
Portland Pattern Repository
★
The Tanenbaum-Torvalds Debate, 1992.01.29
★
Linus Torvalds about the microkernels again, 2006.05.09