In the landscape of Linux Kernel development, there are certain languages, software environments, and tools that every developer should be well-versed in. This article delves into these essential elements.
Primarily, the Linux Kernel is coded in C programming language, making it crucial for developers. Initially, the kernel was crafted in GNU C (although LLVM can now be used too), which extends standard C with additional attributes and keywords. For effective code reading, it’s advisable to learn a modern version of C, like C11, along with GNU extensions.
Additionally, assembly language is used for specific parts of the kernel and optimized segments of several drivers, particularly those architecture-specific. The choice of assembly language depends on your hardware platform, with x86, ARM, and RISC-V being the prevalent ones.
Rust is also gaining traction in the Linux Kernel community as a safer and more reliable alternative to C.
Understanding KBuild and Make, the backbone of Linux’s configurability, is essential for successful kernel code modification or extension. Moreover, shell scripting knowledge is inevitable for automating repetitive tasks and facilitating command-line usage.
Git, the source control system, is indispensable for Linux Kernel development. It’s hard to conceive the kernel development workflow without it.
For developers not working on specific/customized hardware, emulation is the go-to solution, with Qemu/KVM being the popular platform. Emulation significantly reduces development time by allowing easy error correction and testing within a virtual environment.
Kernel debugging capabilities are limited, with printk function calls often being the most popular debugging method. However, the in-kernel ftrace framework, introduced in kernel version 2.6, has evolved into a robust and comprehensive solution offering numerous debugging methods and output formats. Knowledge of ftrace is a must for modern kernel developers.
When dealing with slow kernel modules, perf comes to the rescue. It’s an in-kernel profiling framework coupled with a userspace tool for analyzing in-kernel performance. For gathering kernel runtime information, the eBPF framework is the most flexible and sophisticated tool. It revolutionizes kernel observability by enabling user-defined kernel telemetry.
Embedded development is a major domain area of kernel development, with many embedded devices, from IoT for smart homes to Android-base smartphones, carrying a Linux Kernel variant. The two main build systems in the embedded world are Buildroot and Yocto. While Buildroot is simpler, Yocto offers more flexibility. Both are designed to build a highly customized Linux distro tailored to specific hardware boards.
For embedded developers, the ability to create/update dts-files describing a set of hardware components on the board is essential. Knowledge of u-boot, the primary bootloader in the embedded world, is also vital.
Concerning the dev environment, most Linux Kernel developers prefer using the vim (or qemu) text editor in the terminal, tmux as a terminal multiplexer, and cscope for building a cross-reference for kernel source code.
In the realm of Linux Kernel development, technical expertise can be broadly divided into two categories: general and domain-specific. General skills are universally applicable to all kernel developers, while domain-specific skills apply to developers working in specific areas like networking, storage, virtualization, cryptography, embedded systems, etc. Given the vastness of the Linux Kernel, it’s practically impossible for a developer to master every part with equal proficiency.
Let’s begin with the general skills:
Adherence to Kernel Coding Style – The Linux Kernel has its unique coding style, which may slightly differ across subsystems. Regularly checking your code with the scripts/checkpatch.pl script within the kernel source code tree is a good practice.
Employment of Kernel Coding Patterns – Certain coding patterns are recommended for use in the Kernel. A notable example is the use of the goto operator for allocating/deallocating resources during multi-step resource initialization.
Familiarity with Kernel Internal Data Structures – Certain global data structures in the kernel are pivotal, and every developer should be aware of these. These include Singly and Doubly Linked Lists, Queues, Hashes, Binary Trees, Red-Black trees, Maple trees, etc.
Understanding of Synchronization Primitives – With the advent of SMP CPUs in the early 2000s, kernel developers need to write multithreaded code. The Linux Kernel offers numerous synchronization primitives for different purposes, such as atomic operations, spin locks, semaphores, mutexes, RCUs (a class of lockless algorithm), etc.
Interrupt Handling: Top and Bottom Halves – The Linux Kernel employs a unique interrupt handling scheme involving top and bottom halves. The top half quickly handles an interrupt and returns, while the bottom half does deferred work handling the results delivered by the top half. Every developer should understand this scheme and design their interrupt handlers accordingly.
Deferred Work – Often, a part of the job in Linux Kernel development is postponed for later. There are several deferred work mechanisms in the kernel for different situations, such as task queues, softirqs, tasklets, workqueues, etc.
Memory Management – Kernel developers should be familiar with two layers of memory management: the native layer (kmalloc/kfree functions) and the slab layer built atop it, designed to store structures of varying sizes in different caches to prevent memory fragmentation.
Virtual File System – Irrespective of the type of underlying filesystem (ext3, ext4, zfs, lustrefs, xfs, etc.), the kernel maintains a universal interface over it. A general understanding of VFS is beneficial as filesystem interaction is a common communication method between the kernel and userspace.
Scheduler – The scheduler manages all processes in the operating system, both kernel and userspace. Developers should understand its basics.
System Call Interface – The primary communication method between the kernel and userspace is the system call interface. While the libc library in userspace encapsulates it and provides more user-friendly functions, there are times when a system needs to be called directly. This skill is useful for both userspace and kernel programmers.
/sys /proc Directories – These directories are the second most popular way of interaction between the kernel and userspace. It’s worth learning about these structures.
Loadable Kernel Modules – Many Linux Kernel Developers develop drivers for new hardware devices, which are created in the form of Loadable Kernel Modules. These special format binaries can be loaded/unloaded without rebooting the system. Developers should know the structure of the kernel modules in general, additional rules for character, block, and network devices, and the ways of communication with userspace.
Udev – Driver developers should be aware of the Udev subsystem that provides infrastructure to run user scripts when a device is hotplugged.
Fault Injection Framework – This allows testing unusual code paths by injecting error results into typically always-correct functions like malloc in memory allocation.
Kernel Sanitizers – Tools like KASAN, KMSAN, etc., dynamically catch bad situations like memory corruption. It’s worth loading new kernel modules, running workloads, and trying to catch subtle, dynamic bugs.
Locking Correctness Validator – Parts of the kernel/module code implementing complex locking schemes often lead to deadlocks and livelocks. The lockdep runtime validator catches such situations, saving hours of debugging efforts.
Kdump/Kexec – These are particularly useful when it is almost impossible to debug code, especially if it relates to early system boot time code. Kdump/Kexec loads the second crash kernel, which intercepts the crashed kernel and creates its dump for further analysis.