How PREEMPT_RT works – Realtime Linux

How does PREEMPT_RT achieve its goals, and why does it work the way it does? Was there ever a deliberate design, or did it evolve by chance?

Anyone assuming that kernel developers are a chaotic bunch and that everything works through sheer coincidence will be disappointed. PREEMPT_RT is not the result of randomness – it was built with a clear vision and a master plan.

The KURT project (Kansas University Real-Time), led by Prof. Dr. Douglas Niehaus, began exploring how to transform a general-purpose operating system into a real-time system. Their work culminated in KURT 2.0, built on Linux 2.2.5, which served as the blueprint for what we now know as PREEMPT_RT.

One of the key design concepts was running interrupt handlers as threads – a technique now known as threaded interrupts. Another foundational idea was separating the core kernel logic that handles low-level operations (like interrupt entry code) from peripheral tasks such as programming the disk controller.

This separation introduced the need for an additional locking mechanism. The core kernel uses raw_spinlock_t, which disables both scheduling and interrupt preemption. In contrast, spinlock_t evolved into what is now called sleeping spinlocks. The goal was to keep as much code as possible under the scheduler’s control, making it preemptible.

This design allows the scheduler to interrupt the current context and switch to a real-time workload when necessary. Because interrupt handlers are threaded and use spinlock_t, they too can be preempted.

All of this led to a working prototype based on Linux v2.6.16 [0]. Later, in v2.6.34, the kernel introduced a new feature called per-CPU variables [1]. The closest userspace equivalent would be thread-local storage (TLS) [2].

The idea is that a variable – or even an entire structure – can be tied to a specific CPU and accessed without locking by code running on that CPU. However, this only works if the accessing code is not migrated to another CPU or preempted by another task that might access the same data. To ensure this, preemption must be disabled when accessing per-CPU data from process context.

With preemption disabled, the code cannot be migrated to another CPU, nor can it be preempted by another task on the same CPU. Problem solved.

This assumption also holds true for PREEMPT_RT, but it can introduce complications. If the per-CPU section is lengthy or contains many loops, it may interfere with real-time workloads because the scheduler cannot preempt the current context.

Additionally, code within this section must avoid any function that uses a sleeping lock, such as spinlock_t, since sleeping locks depend on the scheduler to resolve contention. This would violate the non-preemptibility requirement of per-CPU data access.

Some sections were rewritten in ways that didn’t affect PREEMPT_RT, but this approach didn’t work for all users of per-CPU data. A new solution was needed – one that followed the original blueprint while adapting to current challenges.

This led to the introduction of local_lock_t, adding yet another locking primitive. While it solved the problem at hand, it also expanded the pool of lock types developers can choose from.

The growing complexity of lock types—and the uncertainty around their behavior under PREEMPT_RT, led to the need for clear documentation. This effort categorized the existing locking primitives and explained how their semantics change when PREEMPT_RT is enabled [3].

This certainly helped developers understand the locking mechanisms used in the kernel and how PREEMPT_RT leverages them to achieve preemptibility. But is that all? What about the original blueprint?

That blueprint can be summarized as: “All control to the scheduler”. Some of the broader design principles behind how PREEMPT_RT should operate have since been documented and summarized [4].

Does this affect my driver or subsystem?

The short answer: it shouldn’t. Most of the impact is handled by the kernel’s generic debugging infrastructure. Tools like LOCKDEP or warnings such as “sleep inside atomic section” will loudly print messages if violations of rules have been noticed.

However, shifting more control into thread context can introduce new challenges. For example, if a user thread runs with the highest priority in the system for extended periods, it can prevent kernel threads from executing. This can disrupt interrupt handlers and timers, since both are threaded under PREEMPT_RT.

Things get even trickier if the application is busy-waiting on a condition that can never be fulfilled – because it’s blocking the very component (like a threaded interrupt) that would satisfy it. Some of these behavioral changes – and how they differ from a non-real-time kernel – have been documented and summarized [5].

Not all architectures support PREEMPT_RT. Some, like x86, have been part of the journey since its early days. Others, such as RISC-V, rely heavily on generic kernel infrastructure, making PREEMPT_RT support almost effortless – they effectively joined the ride before the PREEMPT_RT train reached its main station.

For anyone planning to bring PREEMPT_RT to an unsupported architecture, there’s a checklist that outlines the key steps needed to ensure a smooth integration [6].

Final Remarks

PREEMPT_RT is not a patch born of chaos—it’s the result of a long-standing vision to make Linux truly real-time capable. From its roots in the KURT project to the problem-solving behind newly introduced concepts, each step has been deliberate and technically grounded.

To ensure PREEMPT_RT continues to function as originally envisioned, many of its core ideas have been documented for easy reference. Hopefully, this will encourage more developers to explore its capabilities and consider adopting it in their own systems.

[0] https://git.kernel.org/bigeasy/linux-rt-history/l/v2.6.16-rt29

[1] https://git.kernel.org/torvalds/c/7340a0b15280c9d

[2] https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

[3] https://docs.kernel.org/locking/locktypes.html

[4] https://docs.kernel.org/core-api/real-time/theory.html

[5] https://docs.kernel.org/core-api/real-time/differences.html

[6] https://docs.kernel.org/core-api/real-time/architecture-porting.html

About the Author

Sebastian Siewior, Linutronix GmbH

Sebastian Siewior

Sebastian first started working with PREEMPT_RT in the late v2.6 cycle. He was debugging bugs and adding features to make life easier. During the v3.10-RT cycle he started maintaining the patchset. Since that time he contributed to various kernel subsystem by shrinking the out-of-tree patch queue.