| .. SPDX-License-Identifier: GPL-2.0 |
| .. Copyright (C) 2022, Google LLC. |
| |
| =================================== |
| The Kernel Memory Sanitizer (KMSAN) |
| =================================== |
| |
| KMSAN is a dynamic error detector aimed at finding uses of uninitialized |
| values. It is based on compiler instrumentation, and is quite similar to the |
| userspace `MemorySanitizer tool`_. |
| |
| An important note is that KMSAN is not intended for production use, because it |
| drastically increases kernel memory footprint and slows the whole system down. |
| |
| Usage |
| ===== |
| |
| Building the kernel |
| ------------------- |
| |
| In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+). |
| Please refer to `LLVM documentation`_ for the instructions on how to build Clang. |
| |
| Now configure and build the kernel with CONFIG_KMSAN enabled. |
| |
| Example report |
| -------------- |
| |
| Here is an example of a KMSAN report:: |
| |
| ===================================================== |
| BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test] |
| test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273 |
| kunit_run_case_internal lib/kunit/test.c:333 |
| kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 |
| kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 |
| kthread+0x721/0x850 kernel/kthread.c:327 |
| ret_from_fork+0x1f/0x30 ??:? |
| |
| Uninit was stored to memory at: |
| do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260 |
| test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 |
| kunit_run_case_internal lib/kunit/test.c:333 |
| kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 |
| kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 |
| kthread+0x721/0x850 kernel/kthread.c:327 |
| ret_from_fork+0x1f/0x30 ??:? |
| |
| Local variable uninit created at: |
| do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256 |
| test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 |
| |
| Bytes 4-7 of 8 are uninitialized |
| Memory access of size 8 starts at ffff888083fe3da0 |
| |
| CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104 |
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 |
| ===================================================== |
| |
| The report says that the local variable ``uninit`` was created uninitialized in |
| ``do_uninit_local_array()``. The third stack trace corresponds to the place |
| where this variable was created. |
| |
| The first stack trace shows where the uninit value was used (in |
| ``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left |
| uninitialized in the local variable, as well as the stack where the value was |
| copied to another memory location before use. |
| |
| A use of uninitialized value ``v`` is reported by KMSAN in the following cases: |
| |
| - in a condition, e.g. ``if (v) { ... }``; |
| - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``; |
| - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``; |
| - when it is passed as an argument to a function, and |
| ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below). |
| |
| The mentioned cases (apart from copying data to userspace or hardware, which is |
| a security issue) are considered undefined behavior from the C11 Standard point |
| of view. |
| |
| Disabling the instrumentation |
| ----------------------------- |
| |
| A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN |
| ignore uninitialized values in that function and mark its output as initialized. |
| As a result, the user will not get KMSAN reports related to that function. |
| |
| Another function attribute supported by KMSAN is ``__no_sanitize_memory``. |
| Applying this attribute to a function will result in KMSAN not instrumenting |
| it, which can be helpful if we do not want the compiler to interfere with some |
| low-level code (e.g. that marked with ``noinstr`` which implicitly adds |
| ``__no_sanitize_memory``). |
| |
| This however comes at a cost: stack allocations from such functions will have |
| incorrect shadow/origin values, likely leading to false positives. Functions |
| called from non-instrumented code may also receive incorrect metadata for their |
| parameters. |
| |
| As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly. |
| |
| It is also possible to disable KMSAN for a single file (e.g. main.o):: |
| |
| KMSAN_SANITIZE_main.o := n |
| |
| or for the whole directory:: |
| |
| KMSAN_SANITIZE := n |
| |
| in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every |
| function in the file or directory. Most users won't need KMSAN_SANITIZE, unless |
| their code gets broken by KMSAN (e.g. runs at early boot time). |
| |
| Support |
| ======= |
| |
| In order for KMSAN to work the kernel must be built with Clang, which so far is |
| the only compiler that has KMSAN support. The kernel instrumentation pass is |
| based on the userspace `MemorySanitizer tool`_. |
| |
| The runtime library only supports x86_64 at the moment. |
| |
| How KMSAN works |
| =============== |
| |
| KMSAN shadow memory |
| ------------------- |
| |
| KMSAN associates a metadata byte (also called shadow byte) with every byte of |
| kernel memory. A bit in the shadow byte is set iff the corresponding bit of the |
| kernel memory byte is uninitialized. Marking the memory uninitialized (i.e. |
| setting its shadow bytes to ``0xff``) is called poisoning, marking it |
| initialized (setting the shadow bytes to ``0x00``) is called unpoisoning. |
| |
| When a new variable is allocated on the stack, it is poisoned by default by |
| instrumentation code inserted by the compiler (unless it is a stack variable |
| that is immediately initialized). Any new heap allocation done without |
| ``__GFP_ZERO`` is also poisoned. |
| |
| Compiler instrumentation also tracks the shadow values as they are used along |
| the code. When needed, instrumentation code invokes the runtime library in |
| ``mm/kmsan/`` to persist shadow values. |
| |
| The shadow value of a basic or compound type is an array of bytes of the same |
| length. When a constant value is written into memory, that memory is unpoisoned. |
| When a value is read from memory, its shadow memory is also obtained and |
| propagated into all the operations which use that value. For every instruction |
| that takes one or more values the compiler generates code that calculates the |
| shadow of the result depending on those values and their shadows. |
| |
| Example:: |
| |
| int a = 0xff; // i.e. 0x000000ff |
| int b; |
| int c = a | b; |
| |
| In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``, |
| shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of |
| ``c`` are uninitialized, while the lower byte is initialized. |
| |
| Origin tracking |
| --------------- |
| |
| Every four bytes of kernel memory also have a so-called origin mapped to them. |
| This origin describes the point in program execution at which the uninitialized |
| value was created. Every origin is associated with either the full allocation |
| stack (for heap-allocated memory), or the function containing the uninitialized |
| variable (for locals). |
| |
| When an uninitialized variable is allocated on stack or heap, a new origin |
| value is created, and that variable's origin is filled with that value. When a |
| value is read from memory, its origin is also read and kept together with the |
| shadow. For every instruction that takes one or more values, the origin of the |
| result is one of the origins corresponding to any of the uninitialized inputs. |
| If a poisoned value is written into memory, its origin is written to the |
| corresponding storage as well. |
| |
| Example 1:: |
| |
| int a = 42; |
| int b; |
| int c = a + b; |
| |
| In this case the origin of ``b`` is generated upon function entry, and is |
| stored to the origin of ``c`` right before the addition result is written into |
| memory. |
| |
| Several variables may share the same origin address, if they are stored in the |
| same four-byte chunk. In this case every write to either variable updates the |
| origin for all of them. We have to sacrifice precision in this case, because |
| storing origins for individual bits (and even bytes) would be too costly. |
| |
| Example 2:: |
| |
| int combine(short a, short b) { |
| union ret_t { |
| int i; |
| short s[2]; |
| } ret; |
| ret.s[0] = a; |
| ret.s[1] = b; |
| return ret.i; |
| } |
| |
| If ``a`` is initialized and ``b`` is not, the shadow of the result would be |
| 0xffff0000, and the origin of the result would be the origin of ``b``. |
| ``ret.s[0]`` would have the same origin, but it will never be used, because |
| that variable is initialized. |
| |
| If both function arguments are uninitialized, only the origin of the second |
| argument is preserved. |
| |
| Origin chaining |
| ~~~~~~~~~~~~~~~ |
| |
| To ease debugging, KMSAN creates a new origin for every store of an |
| uninitialized value to memory. The new origin references both its creation stack |
| and the previous origin the value had. This may cause increased memory |
| consumption, so we limit the length of origin chains in the runtime. |
| |
| Clang instrumentation API |
| ------------------------- |
| |
| Clang instrumentation pass inserts calls to functions defined in |
| ``mm/kmsan/nstrumentation.c`` into the kernel code. |
| |
| Shadow manipulation |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| For every memory access the compiler emits a call to a function that returns a |
| pair of pointers to the shadow and origin addresses of the given memory:: |
| |
| typedef struct { |
| void *shadow, *origin; |
| } shadow_origin_ptr_t |
| |
| shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr) |
| shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr) |
| shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size) |
| shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size) |
| |
| The function name depends on the memory access size. |
| |
| The compiler makes sure that for every loaded value its shadow and origin |
| values are read from memory. When a value is stored to memory, its shadow and |
| origin are also stored using the metadata pointers. |
| |
| Handling locals |
| ~~~~~~~~~~~~~~~ |
| |
| A special function is used to create a new origin value for a local variable and |
| set the origin of that variable to that value:: |
| |
| void __msan_poison_alloca(void *addr, uintptr_t size, char *descr) |
| |
| Access to per-task data |
| ~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| At the beginning of every instrumented function KMSAN inserts a call to |
| ``__msan_get_context_state()``:: |
| |
| kmsan_context_state *__msan_get_context_state(void) |
| |
| ``kmsan_context_state`` is declared in ``include/linux/kmsan.h``:: |
| |
| struct kmsan_context_state { |
| char param_tls[KMSAN_PARAM_SIZE]; |
| char retval_tls[KMSAN_RETVAL_SIZE]; |
| char va_arg_tls[KMSAN_PARAM_SIZE]; |
| char va_arg_origin_tls[KMSAN_PARAM_SIZE]; |
| u64 va_arg_overflow_size_tls; |
| char param_origin_tls[KMSAN_PARAM_SIZE]; |
| depot_stack_handle_t retval_origin_tls; |
| }; |
| |
| This structure is used by KMSAN to pass parameter shadows and origins between |
| instrumented functions (unless the parameters are checked immediately by |
| ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``). |
| |
| Passing uninitialized values to functions |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Clang's MemorySanitizer instrumentation has an option, |
| ``-fsanitize-memory-param-retval``, which makes the compiler check function |
| parameters passed by value, as well as function return values. |
| |
| The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is |
| enabled by default to let KMSAN report uninitialized values earlier. |
| Please refer to the `LKML discussion`_ for more details. |
| |
| Because of the way the checks are implemented in LLVM (they are only applied to |
| parameters marked as ``noundef``), not all parameters are guaranteed to be |
| checked, so we cannot give up the metadata storage in ``kmsan_context_state``. |
| |
| String functions |
| ~~~~~~~~~~~~~~~~ |
| |
| The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the |
| following functions. These functions are also called when data structures are |
| initialized or copied, making sure shadow and origin values are copied alongside |
| with the data:: |
| |
| void *__msan_memcpy(void *dst, void *src, uintptr_t n) |
| void *__msan_memmove(void *dst, void *src, uintptr_t n) |
| void *__msan_memset(void *dst, int c, uintptr_t n) |
| |
| Error reporting |
| ~~~~~~~~~~~~~~~ |
| |
| For each use of a value the compiler emits a shadow check that calls |
| ``__msan_warning()`` in the case that value is poisoned:: |
| |
| void __msan_warning(u32 origin) |
| |
| ``__msan_warning()`` causes KMSAN runtime to print an error report. |
| |
| Inline assembly instrumentation |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| KMSAN instruments every inline assembly output with a call to:: |
| |
| void __msan_instrument_asm_store(void *addr, uintptr_t size) |
| |
| , which unpoisons the memory region. |
| |
| This approach may mask certain errors, but it also helps to avoid a lot of |
| false positives in bitwise operations, atomics etc. |
| |
| Sometimes the pointers passed into inline assembly do not point to valid memory. |
| In such cases they are ignored at runtime. |
| |
| |
| Runtime library |
| --------------- |
| |
| The code is located in ``mm/kmsan/``. |
| |
| Per-task KMSAN state |
| ~~~~~~~~~~~~~~~~~~~~ |
| |
| Every task_struct has an associated KMSAN task state that holds the KMSAN |
| context (see above) and a per-task flag disallowing KMSAN reports:: |
| |
| struct kmsan_context { |
| ... |
| bool allow_reporting; |
| struct kmsan_context_state cstate; |
| ... |
| } |
| |
| struct task_struct { |
| ... |
| struct kmsan_context kmsan; |
| ... |
| } |
| |
| KMSAN contexts |
| ~~~~~~~~~~~~~~ |
| |
| When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to |
| hold the metadata for function parameters and return values. |
| |
| But in the case the kernel is running in the interrupt, softirq or NMI context, |
| where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state:: |
| |
| DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx); |
| |
| Metadata allocation |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| There are several places in the kernel for which the metadata is stored. |
| |
| 1. Each ``struct page`` instance contains two pointers to its shadow and |
| origin pages:: |
| |
| struct page { |
| ... |
| struct page *shadow, *origin; |
| ... |
| }; |
| |
| At boot-time, the kernel allocates shadow and origin pages for every available |
| kernel page. This is done quite late, when the kernel address space is already |
| fragmented, so normal data pages may arbitrarily interleave with the metadata |
| pages. |
| |
| This means that in general for two contiguous memory pages their shadow/origin |
| pages may not be contiguous. Consequently, if a memory access crosses the |
| boundary of a memory block, accesses to shadow/origin memory may potentially |
| corrupt other pages or read incorrect values from them. |
| |
| In practice, contiguous memory pages returned by the same ``alloc_pages()`` |
| call will have contiguous metadata, whereas if these pages belong to two |
| different allocations their metadata pages can be fragmented. |
| |
| For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions |
| there also are no guarantees on metadata contiguity. |
| |
| In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two |
| pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions:: |
| |
| char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); |
| char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); |
| |
| ``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes. |
| All stores to ``dummy_store_page`` are ignored. |
| |
| 2. For vmalloc memory and modules, there is a direct mapping between the memory |
| range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only |
| the first quarter available to ``vmalloc()``. The second quarter of the vmalloc |
| area contains shadow memory for the first quarter, the third one holds the |
| origins. A small part of the fourth quarter contains shadow and origins for the |
| kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for |
| more details. |
| |
| When an array of pages is mapped into a contiguous virtual memory space, their |
| shadow and origin pages are similarly mapped into contiguous regions. |
| |
| References |
| ========== |
| |
| E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized |
| memory use in C++ |
| <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_. |
| In Proceedings of CGO 2015. |
| |
| .. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html |
| .. _LLVM documentation: https://llvm.org/docs/GettingStarted.html |
| .. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/ |